chaokunyang commented on code in PR #2414:
URL: https://github.com/apache/fory/pull/2414#discussion_r2427685304
##########
java/fory-format/README.md:
##########
@@ -8,12 +8,52 @@ Fory row format is heavily inspired by spark tungsten row
format, but with chang
- Decimal use arrow decimal format.
- Variable-size field can be inline in fixed-size region if small enough.
- Allow skip padding by generate Row using aot to put offsets in generated
code.
-- Support adding fields without breaking compatibility.
-The initial fory java row data structure implementation is modified from spark
unsafe row/writer.
+The initial Fory java row data structure implementation is modified from spark
unsafe row/writer.
See `Encoders.bean` Javadoc for a list built-in supported types.
+## Row Format Java
+
+To begin using the row format from Java, start with the `Encoders` class:
+
+```
+// Many built-in types and collections are supported
+public record MyRecord(int key, String value) {}
+
+// The encoder supplier is relatively expensive to create
+// It is thread-safe and should be re-used
+Supplier<RowEncoder<MyRecord>> encoderFactory =
+ Encoders.buildBeanCodec(MyRecord.class)
+ .build();
+
+// Each individual encoder is relatively cheap to create
+// It is not thread-safe, but may be reused by the same thread
+var encoder = encoderFactory.get();
+byte[] encoded = encoder.encode(new MyRecord(42, "Test"));
+
+MyRecord deserialized = encoder.decode(encoded);
+```
+
+## Compact Format
+
+The default row format is cross-language compatible and alignment-padded for
maximum performance.
+When data size is a greater concern, the compact format provides an alternate
encoding that uses
+significantly less space.
+
+Optimizations include:
+
+- struct stores fixed-size fields (e.g. Int128. FixedSizeBinary) inline in
fixed-data area without offset + size
+- struct of all fixed-sized fields is itself considered fixed-size to store in
other struct or array
+- struct skips null bitmap if all fields are non-nullable
+- struct sorts fields by fixed-size for best-effort (but not guaranteed)
alignment
+- struct can use less than 8 bytes for small data (int, short, etc)
+- struct null bitmap stored at end of struct to borrow alignment padding if
possible
+- array stores fixed-size fields inline in fixed-data area without offset+size
+- array header uses 4 bytes for size (since Collection and array are only
int-sized) and leaves remaining 4 bytes for start of null bitmap
+
Review Comment:
could we add a minmal example about how to use compact mode?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]