Re: [PR] feat(spec): standardizing fury cross-language serialization specification [incubator-fury]

via GitHub Sun, 24 Mar 2024 06:09:39 -0700


chaokunyang commented on code in PR #1413:
URL: https://github.com/apache/incubator-fury/pull/1413#discussion_r1536812554



##########
docs/protocols/xlang_object_graph_spec.md:
##########
@@ -0,0 +1,612 @@
+# Cross language object graph serialization
+
+Fury xlang serialization is an automatic object serialization framework that 
supports reference and polymorphism.
+Fury will convert an object from/to fury xlang serialization binary format.
+Fury has two core concepts for xlang serialization:
+
+- **Fury xlang binary format**
+- **Framework implemented in different languages to convert object to/from 
Fury xlang binary format**
+
+The serialization format is a dynamic binary format. The dynamics and 
reference/polymorphism support make Fury flexible,
+much more easy to use, but
+also introduce more complexities compared to static serialization frameworks. 
So the format will be more complex.
+
+## Type Systems
+
+### Data Types
+
+- bool: A boolean value (true or false).
+- byte: An 8-bit signed integer.
+- i16: A 16-bit signed integer.
+- i32: A 32-bit signed integer.
+- i64: A 64-bit signed integer.
+- half-float: A 16-bit floating point number.
+- float: A 32-bit floating point number.
+- double: A 64-bit floating point number including NaN and Infinity.
+- string: A text string encoded using Latin1/UTF16/UTF-8 encoding.

Review Comment:
   > Is Latin1 still widely used? And how about UTF-32?
   
   UTF-32 use 4 byte for a char, which will bloat the data a lot. 
   
   Actually most chars can be expressed using Latin1, and there are many 
langauges such as java/python/javascript support `Latin1/UTF-16` natively, so 
we add such encoding here. When the language support does support the 
`Latin1/UTF-16` encoding, we can skip the encoding/decoding cost, and using a 
memory copy to create a string object.
   
   Languages like rust/golang using utf-8 for string encoding, they can still 
use a copy to create a string object from the serialized data if the data is 
encoded using utf-8. But if the peer language is Java, a conversion from 
latin1/utf-16 to utf8 would be needed. But it's Ok, because if we use utf-8 
encoding only, the java would need to encode the latin1/utf16 to utf8 when 
serialization, the cost didn't go way.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat(spec): standardizing fury cross-language serialization specification [incubator-fury]

Reply via email to