This is an automated email from the ASF dual-hosted git repository. chaokunyang pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/fory-site.git
commit c4ccc2f28ac420692f3432f16ae4b55a545dcccf Author: chaokunyang <[email protected]> AuthorDate: Fri Jan 23 14:52:29 2026 +0000 🔄 synced local 'docs/specification/' with remote 'docs/specification/' --- docs/specification/xlang_serialization_spec.md | 151 +++++++++++++++++++++---- 1 file changed, 126 insertions(+), 25 deletions(-) diff --git a/docs/specification/xlang_serialization_spec.md b/docs/specification/xlang_serialization_spec.md index 34bdd581a..530edbac5 100644 --- a/docs/specification/xlang_serialization_spec.md +++ b/docs/specification/xlang_serialization_spec.md @@ -85,6 +85,8 @@ This specification defines the Fory xlang binary format. The format is dynamic r - float32_array: one dimensional float32 array. - float64_array: one dimensional float64 array. - union: a tagged union type that can hold one of several alternative types. The active alternative is identified by an index. +- typed_union: a union value with embedded numeric union type ID. +- named_union: a union value with embedded union type name or shared TypeDef. - none: represents an empty/unit value with no data (e.g., for empty union alternatives). Note: @@ -190,30 +192,32 @@ Named types (`NAMED_*`) do not embed a user ID; their names are carried in metad | 28 | NAMED_COMPATIBLE_STRUCT | Struct with schema evolution (by name) | | 29 | EXT | Extension type registered by numeric ID | | 30 | NAMED_EXT | Extension type registered by namespace + type name | -| 31 | UNION | Tagged union type (one of several alternatives) | -| 32 | NONE | Empty/unit type (no data) | -| 33 | DURATION | Time duration (seconds + nanoseconds) | -| 34 | TIMESTAMP | Point in time (nanoseconds since epoch) | -| 35 | LOCAL_DATE | Date without timezone (days since epoch) | -| 36 | DECIMAL | Arbitrary precision decimal | -| 37 | BINARY | Raw binary data | -| 38 | ARRAY | Generic array type | -| 39 | BOOL_ARRAY | 1D boolean array | -| 40 | INT8_ARRAY | 1D int8 array | -| 41 | INT16_ARRAY | 1D int16 array | -| 42 | INT32_ARRAY | 1D int32 array | -| 43 | INT64_ARRAY | 1D int64 array | -| 44 | UINT8_ARRAY | 1D uint8 array | -| 45 | UINT16_ARRAY | 1D uint16 array | -| 46 | UINT32_ARRAY | 1D uint32 array | -| 47 | UINT64_ARRAY | 1D uint64 array | -| 48 | FLOAT16_ARRAY | 1D float16 array | -| 49 | FLOAT32_ARRAY | 1D float32 array | -| 50 | FLOAT64_ARRAY | 1D float64 array | +| 31 | UNION | Union value, schema identity not embedded | +| 32 | TYPED_UNION | Union with embedded numeric union type ID | +| 33 | NAMED_UNION | Union with embedded union type name/TypeDef | +| 34 | NONE | Empty/unit type (no data) | +| 35 | DURATION | Time duration (seconds + nanoseconds) | +| 36 | TIMESTAMP | Point in time (nanoseconds since epoch) | +| 37 | LOCAL_DATE | Date without timezone (days since epoch) | +| 38 | DECIMAL | Arbitrary precision decimal | +| 39 | BINARY | Raw binary data | +| 40 | ARRAY | Generic array type | +| 41 | BOOL_ARRAY | 1D boolean array | +| 42 | INT8_ARRAY | 1D int8 array | +| 43 | INT16_ARRAY | 1D int16 array | +| 44 | INT32_ARRAY | 1D int32 array | +| 45 | INT64_ARRAY | 1D int64 array | +| 46 | UINT8_ARRAY | 1D uint8 array | +| 47 | UINT16_ARRAY | 1D uint16 array | +| 48 | UINT32_ARRAY | 1D uint32 array | +| 49 | UINT64_ARRAY | 1D uint64 array | +| 50 | FLOAT16_ARRAY | 1D float16 array | +| 51 | FLOAT32_ARRAY | 1D float32 array | +| 52 | FLOAT64_ARRAY | 1D float64 array | #### Type ID Encoding for User Types -When registering user types (struct/ext/enum), the full type ID combines user ID and internal type ID: +When registering user types (struct/ext/enum/union), the full type ID combines user ID and internal type ID: ``` Full Type ID = (user_type_id << 8) | internal_type_id @@ -415,7 +419,7 @@ followed by optional type-specific metadata. - Internal types use their internal type ID directly (low 8 bits). - User-registered types use a full type ID: `(user_type_id << 8) | internal_type_id`. - `user_type_id` is a numeric ID (0-4095 in current implementations). - - `internal_type_id` is one of `ENUM`, `STRUCT`, `COMPATIBLE_STRUCT`, or `EXT`. + - `internal_type_id` is one of `ENUM`, `STRUCT`, `COMPATIBLE_STRUCT`, `EXT`, or `UNION`. - Named types do not embed a user ID. They use `NAMED_*` internal type IDs and carry a namespace and type name (or shared TypeDef) instead. @@ -423,13 +427,14 @@ followed by optional type-specific metadata. After the type ID: -- **ENUM / STRUCT / EXT**: no extra bytes (registration by ID required on both sides). +- **ENUM / STRUCT / EXT / TYPED_UNION**: no extra bytes (registration by ID required on both sides). - **COMPATIBLE_STRUCT**: - If meta share is enabled, write a shared TypeDef entry (see below). - If meta share is disabled, no extra bytes. -- **NAMED_ENUM / NAMED_STRUCT / NAMED_COMPATIBLE_STRUCT / NAMED_EXT**: +- **NAMED_ENUM / NAMED_STRUCT / NAMED_COMPATIBLE_STRUCT / NAMED_EXT / NAMED_UNION**: - If meta share is disabled, write `namespace` and `type_name` as meta strings. - If meta share is enabled, write a shared TypeDef entry (see below). +- **UNION**: no extra bytes at this layer. - **LIST / SET / MAP / ARRAY / primitives**: no extra bytes at this layer. Unregistered types are serialized as named types: @@ -437,6 +442,7 @@ Unregistered types are serialized as named types: - Enums -> `NAMED_ENUM` - Struct-like classes -> `NAMED_STRUCT` (or `NAMED_COMPATIBLE_STRUCT` when meta share is enabled) - Custom extension types -> `NAMED_EXT` +- Unions -> `NAMED_UNION` The namespace is the package/module name and the type name is the simple class name. @@ -1246,7 +1252,8 @@ Assign each field to exactly one group in the following order: 1. **Primitive (non-nullable)**: primitive or boxed numeric/boolean types with `nullable=false`. 2. **Primitive (nullable)**: primitive or boxed numeric/boolean types with `nullable=true`. 3. **Built-in (non-container)**: internal type IDs that are not user-defined and not UNKNOWN, - excluding collections and maps (for example: STRING, TIME types, UNION, primitive arrays). + excluding collections and maps (for example: STRING, TIME types, UNION/TYPED_UNION/NAMED_UNION, + primitive arrays). 4. **Collection**: list/set/object-array fields. Non-primitive arrays are treated as LIST for ordering purposes. 5. **Map**: map fields. @@ -1312,6 +1319,100 @@ The field value layout is the same as schema-consistent mode, but the type meta TypeDef to map fields by name or tag ID and to honor nullable/ref flags from metadata; unknown fields are skipped. +### Union + +Union values are encoded using three union type IDs so the union schema identity lives in type meta (like +`STRUCT/ENUM/EXT`) and is easy to carry inside `Any`. + +#### IDL syntax + +```fdl +union Contact [id=0] { + string email = 1; + int32 phone = 2; +} +``` + +Rules: + +- Each union alternative MUST have a stable tag number (`= 1`, `= 2`, ...). +- Tag numbers MUST be unique within the union and MUST NOT be reused. + +#### Type IDs and type meta + +| Type ID | Name | Meaning | +| ------: | ----------- | ---------------------------------------------------- | +| 31 | UNION | Union value, schema identity not embedded | +| 32 | TYPED_UNION | Union value with embedded registered numeric type ID | +| 33 | NAMED_UNION | Union value with embedded type name / shared TypeDef | + +Type meta encoding: + +- `UNION (31)`: no additional type meta payload. +- `TYPED_UNION (32)`: no additional type meta payload (numeric ID is carried in the full type ID itself). +- `NAMED_UNION (33)`: followed by named type meta (namespace + type name, or shared TypeDef marker/body). + +#### Union value payload + +A union payload is: + +``` +| case_id (varuint32) | case_value (Any-style value) | +``` + +`case_id` is the union alternative tag number. + +`case_value` MUST be encoded as a full xlang value: + +``` +| field_ref_meta | field_value_type_meta | field_value_bytes | +``` + +This is required even for primitives so unknown alternatives can be skipped safely. + +#### Wire layouts + +**UNION (schema known from context)** + +``` +| ... outer ref meta ... | type_id=UNION(31) | case_id | case_value | +``` + +**TYPED_UNION (schema embedded by numeric id)** + +``` +| ... outer ref meta ... | embedded type id | case_id | case_value | +``` + +embedded type id: `type_id=(user_type_id << 8) | TYPED_UNION(32)` + +**NAMED_UNION (schema embedded by name/typedef)** + +``` +| ... outer ref meta ... | type_id=NAMED_UNION(33) | name_or_typedef | case_id | case_value | +``` + +#### Decoding rules + +1. Read outer ref meta and `type_id`. +2. If `TYPED_UNION`, resolve the union schema from the full type ID. +3. If `NAMED_UNION`, read named type meta and resolve the union schema. +4. Read `case_id`. +5. Read `case_value` as Any-style value (ref meta + type meta + value). + +If `case_id` is unknown, the decoder MUST still consume the case value using `field_value_type_meta` and +standard `skipValue(type_id)`. + +#### When to use each type ID + +- Use `UNION` when the union schema is known from context. +- Use `TYPED_UNION` for dynamic containers when numeric registration is available. +- Use `NAMED_UNION` when name-based resolution is preferred or required. + +#### Compatibility notes + +- `case_id` is a stable identifier; added alternatives are forward compatible and unknown cases can be skipped. + ### Type Type will be serialized using type meta format. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
