chaokunyang commented on code in PR #1413: URL: https://github.com/apache/incubator-fury/pull/1413#discussion_r1536812554
########## docs/protocols/xlang_object_graph_spec.md: ########## @@ -0,0 +1,612 @@ +# Cross language object graph serialization + +Fury xlang serialization is an automatic object serialization framework that supports reference and polymorphism. +Fury will convert an object from/to fury xlang serialization binary format. +Fury has two core concepts for xlang serialization: + +- **Fury xlang binary format** +- **Framework implemented in different languages to convert object to/from Fury xlang binary format** + +The serialization format is a dynamic binary format. The dynamics and reference/polymorphism support make Fury flexible, +much more easy to use, but +also introduce more complexities compared to static serialization frameworks. So the format will be more complex. + +## Type Systems + +### Data Types + +- bool: A boolean value (true or false). +- byte: An 8-bit signed integer. +- i16: A 16-bit signed integer. +- i32: A 32-bit signed integer. +- i64: A 64-bit signed integer. +- half-float: A 16-bit floating point number. +- float: A 32-bit floating point number. +- double: A 64-bit floating point number including NaN and Infinity. +- string: A text string encoded using Latin1/UTF16/UTF-8 encoding. Review Comment: > Is Latin1 still widely used? And how about UTF-32? UTF-32 use 4 byte for a char, which will bloat the data a lot. Actually most chars can be expressed using Latin1, and there are many langauges such as java/python/javascript support `Latin1/UTF-16` natively, so we add such encoding here. When the language support does support the `Latin1/UTF-16` encoding, we can skip the encoding/decoding cost, and using a memory copy to create a string object. Languages like rust/golang using utf-8 for string encoding, they can still use a copy to create a string object from the serialized data if the data is encoded using utf-8. But if the peer language is Java, a conversion from latin1/utf-16 to utf8 would be needed. But it's Ok, because if we use utf-8 encoding only, the java would need to encode the latin1/utf16 to utf8 when serialization, the cost didn't go way. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
