(fory-site) 02/02: 🔄 synced local 'docs/specification/' with remote 'docs/specification/'

chaokunyang Tue, 03 Feb 2026 23:27:51 -0800

This is an automated email from the ASF dual-hosted git repository.

chaokunyang pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/fory-site.git


commit a18bb371242a6d6e15b06c7522338aa365e098dc
Author: chaokunyang <[email protected]>
AuthorDate: Wed Feb 4 07:27:39 2026 +0000

    🔄 synced local 'docs/specification/' with remote 'docs/specification/'
---
 docs/specification/java_serialization_spec.md  | 115 ++++++----------
 docs/specification/xlang_serialization_spec.md | 174 ++++++++++++++-----------
 docs/specification/xlang_type_mapping.md       |  95 +++++++++-----
 3 files changed, 204 insertions(+), 180 deletions(-)

diff --git a/docs/specification/java_serialization_spec.md 
b/docs/specification/java_serialization_spec.md
index 5e72d4226d..a6957370b5 100644
--- a/docs/specification/java_serialization_spec.md
+++ b/docs/specification/java_serialization_spec.md
@@ -83,84 +83,55 @@ full_type_id = (user_type_id << 8) | internal_type_id
 - Named types use `NAMED_*` internal IDs and carry names in metadata rather 
than embedding a user
   ID.
 
-### Shared internal type IDs (0-32)
-
-Java native mode shares the xlang internal IDs for basic types and 
user-defined enum/struct/ext
-tags. These IDs are stable across languages.
-
-| Type ID | Name                    |
-| ------- | ----------------------- |
-| 0       | UNKNOWN                 |
-| 1       | BOOL                    |
-| 2       | INT8                    |
-| 3       | INT16                   |
-| 4       | INT32                   |
-| 5       | VARINT32                |
-| 6       | INT64                   |
-| 7       | VARINT64                |
-| 8       | TAGGED_INT64            |
-| 9       | UINT8                   |
-| 10      | UINT16                  |
-| 11      | UINT32                  |
-| 12      | VAR_UINT32              |
-| 13      | UINT64                  |
-| 14      | VAR_UINT64              |
-| 15      | TAGGED_UINT64           |
-| 16      | FLOAT16                 |
-| 17      | FLOAT32                 |
-| 18      | FLOAT64                 |
-| 19      | STRING                  |
-| 20      | LIST                    |
-| 21      | SET                     |
-| 22      | MAP                     |
-| 23      | ENUM                    |
-| 24      | NAMED_ENUM              |
-| 25      | STRUCT                  |
-| 26      | COMPATIBLE_STRUCT       |
-| 27      | NAMED_STRUCT            |
-| 28      | NAMED_COMPATIBLE_STRUCT |
-| 29      | EXT                     |
-| 30      | NAMED_EXT               |
-| 31      | UNION                   |
-| 32      | NONE                    |
+### Shared internal type IDs (0-63)
+
+Java native mode shares the xlang internal IDs for all values below 64. IDs 
`0~56` are defined by
+the xlang spec, while `57~63` are reserved for future internal use. These IDs 
are stable across
+languages.
+
+See the internal type ID table in
+[Xlang Serialization 
Format](xlang_serialization_spec.md#internal-type-id-table).
+Java shares all IDs `< 64`, with `57~63` reserved for future internal use.
 
 ### Java native built-in type IDs
 
-Java native serialization assigns Java-specific built-ins starting at 
`Types.NONE + 1`.
-Type IDs greater than 32 are not shared with xlang; they are only valid in 
Java native mode.
+Java native serialization assigns Java-specific built-ins starting at
+`Types.BOUND + 5` (`Types.BOUND` is 64; 5 IDs are reserved for future use).
+Type IDs in `0~56` are shared with xlang; `57~63` are reserved; `64+` are only
+valid in Java native mode.
 
 | Type ID | Name                       | Description                    |
 | ------- | -------------------------- | ------------------------------ |
-| 33      | VOID_ID                    | java.lang.Void                 |
-| 34      | CHAR_ID                    | java.lang.Character            |
-| 35      | PRIMITIVE_VOID_ID          | void                           |
-| 36      | PRIMITIVE_BOOL_ID          | boolean                        |
-| 37      | PRIMITIVE_INT8_ID          | byte                           |
-| 38      | PRIMITIVE_CHAR_ID          | char                           |
-| 39      | PRIMITIVE_INT16_ID         | short                          |
-| 40      | PRIMITIVE_INT32_ID         | int                            |
-| 41      | PRIMITIVE_FLOAT32_ID       | float                          |
-| 42      | PRIMITIVE_INT64_ID         | long                           |
-| 43      | PRIMITIVE_FLOAT64_ID       | double                         |
-| 44      | PRIMITIVE_BOOLEAN_ARRAY_ID | boolean[]                      |
-| 45      | PRIMITIVE_BYTE_ARRAY_ID    | byte[]                         |
-| 46      | PRIMITIVE_CHAR_ARRAY_ID    | char[]                         |
-| 47      | PRIMITIVE_SHORT_ARRAY_ID   | short[]                        |
-| 48      | PRIMITIVE_INT_ARRAY_ID     | int[]                          |
-| 49      | PRIMITIVE_FLOAT_ARRAY_ID   | float[]                        |
-| 50      | PRIMITIVE_LONG_ARRAY_ID    | long[]                         |
-| 51      | PRIMITIVE_DOUBLE_ARRAY_ID  | double[]                       |
-| 52      | STRING_ARRAY_ID            | String[]                       |
-| 53      | OBJECT_ARRAY_ID            | Object[]                       |
-| 54      | ARRAYLIST_ID               | java.util.ArrayList            |
-| 55      | HASHMAP_ID                 | java.util.HashMap              |
-| 56      | HASHSET_ID                 | java.util.HashSet              |
-| 57      | CLASS_ID                   | java.lang.Class                |
-| 58      | EMPTY_OBJECT_ID            | empty object stub              |
-| 59      | LAMBDA_STUB_ID             | lambda stub                    |
-| 60      | JDK_PROXY_STUB_ID          | JDK proxy stub                 |
-| 61      | REPLACE_STUB_ID            | writeReplace/readResolve stub  |
-| 62      | NONEXISTENT_META_SHARED_ID | meta-shared unknown class stub |
+| 69      | VOID_ID                    | java.lang.Void                 |
+| 70      | CHAR_ID                    | java.lang.Character            |
+| 71      | PRIMITIVE_VOID_ID          | void                           |
+| 72      | PRIMITIVE_BOOL_ID          | boolean                        |
+| 73      | PRIMITIVE_INT8_ID          | byte                           |
+| 74      | PRIMITIVE_CHAR_ID          | char                           |
+| 75      | PRIMITIVE_INT16_ID         | short                          |
+| 76      | PRIMITIVE_INT32_ID         | int                            |
+| 77      | PRIMITIVE_FLOAT32_ID       | float                          |
+| 78      | PRIMITIVE_INT64_ID         | long                           |
+| 79      | PRIMITIVE_FLOAT64_ID       | double                         |
+| 80      | PRIMITIVE_BOOLEAN_ARRAY_ID | boolean[]                      |
+| 81      | PRIMITIVE_BYTE_ARRAY_ID    | byte[]                         |
+| 82      | PRIMITIVE_CHAR_ARRAY_ID    | char[]                         |
+| 83      | PRIMITIVE_SHORT_ARRAY_ID   | short[]                        |
+| 84      | PRIMITIVE_INT_ARRAY_ID     | int[]                          |
+| 85      | PRIMITIVE_FLOAT_ARRAY_ID   | float[]                        |
+| 86      | PRIMITIVE_LONG_ARRAY_ID    | long[]                         |
+| 87      | PRIMITIVE_DOUBLE_ARRAY_ID  | double[]                       |
+| 88      | STRING_ARRAY_ID            | String[]                       |
+| 89      | OBJECT_ARRAY_ID            | Object[]                       |
+| 90      | ARRAYLIST_ID               | java.util.ArrayList            |
+| 91      | HASHMAP_ID                 | java.util.HashMap              |
+| 92      | HASHSET_ID                 | java.util.HashSet              |
+| 93      | CLASS_ID                   | java.lang.Class                |
+| 94      | EMPTY_OBJECT_ID            | empty object stub              |
+| 95      | LAMBDA_STUB_ID             | lambda stub                    |
+| 96      | JDK_PROXY_STUB_ID          | JDK proxy stub                 |
+| 97      | REPLACE_STUB_ID            | writeReplace/readResolve stub  |
+| 98      | NONEXISTENT_META_SHARED_ID | meta-shared unknown class stub |
 
 ### Registration and named types
 
diff --git a/docs/specification/xlang_serialization_spec.md 
b/docs/specification/xlang_serialization_spec.md
index 670bfe1119..b611d8fd8a 100644
--- a/docs/specification/xlang_serialization_spec.md
+++ b/docs/specification/xlang_serialization_spec.md
@@ -51,7 +51,9 @@ This specification defines the Fory xlang binary format. The 
format is dynamic r
 - uint64: a 64-bit unsigned integer.
 - var_uint64: a 64-bit unsigned integer which use fory PVL encoding.
 - tagged_uint64: a 64-bit unsigned integer which use fory Hybrid encoding.
+- float8: an 8-bit floating point number.
 - float16: a 16-bit floating point number.
+- bfloat16: a 16-bit brain floating point number.
 - float32: a 32-bit floating point number.
 - float64: a 64-bit floating point number including NaN and Infinity.
 - string: a text string encoded using Latin1/UTF16/UTF-8 encoding.
@@ -81,11 +83,13 @@ This specification defines the Fory xlang binary format. 
The format is dynamic r
   - int16_array: one dimensional int16 array.
   - int32_array: one dimensional int32 array.
   - int64_array: one dimensional int64 array.
+  - float8_array: one dimensional float8 array.
   - float16_array: one dimensional half_float_16 array.
+  - bfloat16_array: one dimensional bfloat16 array.
   - float32_array: one dimensional float32 array.
   - float64_array: one dimensional float64 array.
 - union: a tagged union type that can hold one of several alternative types. 
The active alternative is identified by an index.
-- typed_union: a union value with embedded numeric union type ID.
+- typed_union: a union value with registered numeric union type ID.
 - named_union: a union value with embedded union type name or shared TypeDef.
 - none: represents an empty/unit value with no data (e.g., for empty union 
alternatives).
 
@@ -150,10 +154,9 @@ Such information can be provided in other languages too:
 
 ### Type ID
 
-All internal data types use an 8-bit internal ID (`0~255`, with `0~50` defined 
here). Users can
-register types by numeric ID (`0~4095` in current implementations). User IDs 
are encoded together
-with the internal type ID:
-`(user_type_id << 8) | internal_type_id`.
+All internal data types use an 8-bit internal ID (`0~255`, with `0~56` defined 
here). Users can
+register types by numeric ID (`0~0xFFFFFFFE` in current implementations). User 
IDs are encoded
+separately from the internal type ID; there is no bit shifting/packing.
 
 Named types (`NAMED_*`) do not embed a user ID; their names are carried in 
metadata instead.
 
@@ -177,66 +180,68 @@ Named types (`NAMED_*`) do not embed a user ID; their 
names are carried in metad
 | 13      | UINT64                  | 64-bit unsigned integer                  
           |
 | 14      | VAR_UINT64              | Variable-length encoded 64-bit unsigned 
integer     |
 | 15      | TAGGED_UINT64           | Hybrid encoded 64-bit unsigned integer   
           |
-| 16      | FLOAT16                 | 16-bit floating point (half precision)   
           |
-| 17      | FLOAT32                 | 32-bit floating point (single precision) 
           |
-| 18      | FLOAT64                 | 64-bit floating point (double precision) 
           |
-| 19      | STRING                  | UTF-8/UTF-16/Latin1 encoded string       
           |
-| 20      | LIST                    | Ordered collection (List, Array, Vector) 
           |
-| 21      | SET                     | Unordered collection of unique elements  
           |
-| 22      | MAP                     | Key-value mapping                        
           |
-| 23      | ENUM                    | Enum registered by numeric ID            
           |
-| 24      | NAMED_ENUM              | Enum registered by namespace + type name 
           |
-| 25      | STRUCT                  | Struct registered by numeric ID (schema 
consistent) |
-| 26      | COMPATIBLE_STRUCT       | Struct with schema evolution support (by 
ID)        |
-| 27      | NAMED_STRUCT            | Struct registered by namespace + type 
name          |
-| 28      | NAMED_COMPATIBLE_STRUCT | Struct with schema evolution (by name)   
           |
-| 29      | EXT                     | Extension type registered by numeric ID  
           |
-| 30      | NAMED_EXT               | Extension type registered by namespace + 
type name  |
-| 31      | UNION                   | Union value, schema identity not 
embedded           |
-| 32      | TYPED_UNION             | Union with embedded numeric union type 
ID           |
-| 33      | NAMED_UNION             | Union with embedded union type 
name/TypeDef         |
-| 34      | NONE                    | Empty/unit type (no data)                
           |
-| 35      | DURATION                | Time duration (seconds + nanoseconds)    
           |
-| 36      | TIMESTAMP               | Point in time (seconds + nanoseconds 
since epoch)   |
-| 37      | DATE                    | Date without timezone (days since epoch) 
           |
-| 38      | DECIMAL                 | Arbitrary precision decimal              
           |
-| 39      | BINARY                  | Raw binary data                          
           |
-| 40      | ARRAY                   | Generic array type                       
           |
-| 41      | BOOL_ARRAY              | 1D boolean array                         
           |
-| 42      | INT8_ARRAY              | 1D int8 array                            
           |
-| 43      | INT16_ARRAY             | 1D int16 array                           
           |
-| 44      | INT32_ARRAY             | 1D int32 array                           
           |
-| 45      | INT64_ARRAY             | 1D int64 array                           
           |
-| 46      | UINT8_ARRAY             | 1D uint8 array                           
           |
-| 47      | UINT16_ARRAY            | 1D uint16 array                          
           |
-| 48      | UINT32_ARRAY            | 1D uint32 array                          
           |
-| 49      | UINT64_ARRAY            | 1D uint64 array                          
           |
-| 50      | FLOAT16_ARRAY           | 1D float16 array                         
           |
-| 51      | FLOAT32_ARRAY           | 1D float32 array                         
           |
-| 52      | FLOAT64_ARRAY           | 1D float64 array                         
           |
+| 16      | FLOAT8                  | 8-bit floating point (float8)            
           |
+| 17      | FLOAT16                 | 16-bit floating point (half precision)   
           |
+| 18      | BFLOAT16                | 16-bit brain floating point              
           |
+| 19      | FLOAT32                 | 32-bit floating point (single precision) 
           |
+| 20      | FLOAT64                 | 64-bit floating point (double precision) 
           |
+| 21      | STRING                  | UTF-8/UTF-16/Latin1 encoded string       
           |
+| 22      | LIST                    | Ordered collection (List, Array, Vector) 
           |
+| 23      | SET                     | Unordered collection of unique elements  
           |
+| 24      | MAP                     | Key-value mapping                        
           |
+| 25      | ENUM                    | Enum registered by numeric ID            
           |
+| 26      | NAMED_ENUM              | Enum registered by namespace + type name 
           |
+| 27      | STRUCT                  | Struct registered by numeric ID (schema 
consistent) |
+| 28      | COMPATIBLE_STRUCT       | Struct with schema evolution support (by 
ID)        |
+| 29      | NAMED_STRUCT            | Struct registered by namespace + type 
name          |
+| 30      | NAMED_COMPATIBLE_STRUCT | Struct with schema evolution (by name)   
           |
+| 31      | EXT                     | Extension type registered by numeric ID  
           |
+| 32      | NAMED_EXT               | Extension type registered by namespace + 
type name  |
+| 33      | UNION                   | Union value, schema identity not 
embedded           |
+| 34      | TYPED_UNION             | Union value with registered numeric type 
ID         |
+| 35      | NAMED_UNION             | Union value with embedded type 
name/TypeDef         |
+| 36      | NONE                    | Empty/unit type (no data)                
           |
+| 37      | DURATION                | Time duration (seconds + nanoseconds)    
           |
+| 38      | TIMESTAMP               | Point in time (seconds + nanoseconds 
since epoch)   |
+| 39      | DATE                    | Date without timezone (days since epoch) 
           |
+| 40      | DECIMAL                 | Arbitrary precision decimal              
           |
+| 41      | BINARY                  | Raw binary data                          
           |
+| 42      | ARRAY                   | Generic array type                       
           |
+| 43      | BOOL_ARRAY              | 1D boolean array                         
           |
+| 44      | INT8_ARRAY              | 1D int8 array                            
           |
+| 45      | INT16_ARRAY             | 1D int16 array                           
           |
+| 46      | INT32_ARRAY             | 1D int32 array                           
           |
+| 47      | INT64_ARRAY             | 1D int64 array                           
           |
+| 48      | UINT8_ARRAY             | 1D uint8 array                           
           |
+| 49      | UINT16_ARRAY            | 1D uint16 array                          
           |
+| 50      | UINT32_ARRAY            | 1D uint32 array                          
           |
+| 51      | UINT64_ARRAY            | 1D uint64 array                          
           |
+| 52      | FLOAT8_ARRAY            | 1D float8 array                          
           |
+| 53      | FLOAT16_ARRAY           | 1D float16 array                         
           |
+| 54      | BFLOAT16_ARRAY          | 1D bfloat16 array                        
           |
+| 55      | FLOAT32_ARRAY           | 1D float32 array                         
           |
+| 56      | FLOAT64_ARRAY           | 1D float64 array                         
           |
 
 #### Type ID Encoding for User Types
 
-When registering user types (struct/ext/enum/union), the full type ID combines 
user ID and internal type ID:
-
-```
-Full Type ID = (user_type_id << 8) | internal_type_id
-```
+When registering user types (struct/ext/enum/union), the internal type ID is 
written as the 8-bit
+kind. The user type ID is written separately as an unsigned varint32 (small7); 
there is no bit
+shift or packing.
 
 **Examples:**
 
-| User ID | Type              | Internal ID | Full Type ID     | Decimal |
-| ------- | ----------------- | ----------- | ---------------- | ------- |
-| 0       | STRUCT            | 25          | `(0 << 8) \| 25` | 25      |
-| 0       | ENUM              | 23          | `(0 << 8) \| 23` | 23      |
-| 1       | STRUCT            | 25          | `(1 << 8) \| 25` | 281     |
-| 1       | COMPATIBLE_STRUCT | 26          | `(1 << 8) \| 26` | 282     |
-| 2       | NAMED_STRUCT      | 27          | `(2 << 8) \| 27` | 539     |
+| User ID | Type              | Internal ID | Encoded User ID | Decimal |
+| ------- | ----------------- | ----------- | --------------- | ------- |
+| 0       | STRUCT            | 27          | 0               | 0       |
+| 0       | ENUM              | 25          | 0               | 0       |
+| 1       | STRUCT            | 27          | 1               | 1       |
+| 1       | COMPATIBLE_STRUCT | 28          | 1               | 1       |
+| 2       | NAMED_STRUCT      | 29          | 2               | 2       |
 
 When reading type IDs:
 
-- Extract internal type: `internal_type_id = full_type_id & 0xFF`
-- Extract user type ID: `user_type_id = full_type_id >> 8`
+- Read internal type ID from the type ID field.
+- If the internal type is a user-registered kind, read `user_type_id` as 
varuint32.
 
 ### Type mapping
 
@@ -401,9 +406,9 @@ followed by optional type-specific metadata.
 
 - The type ID is written as an unsigned varint32 (small7).
 - Internal types use their internal type ID directly (low 8 bits).
-- User-registered types use a full type ID: `(user_type_id << 8) | 
internal_type_id`.
-  - `user_type_id` is a numeric ID (0-4095 in current implementations).
-  - `internal_type_id` is one of `ENUM`, `STRUCT`, `COMPATIBLE_STRUCT`, `EXT`, 
or `UNION`.
+- User-registered types write the internal type ID, then write `user_type_id` 
as varuint32.
+  - `user_type_id` is a numeric ID (0~0xFFFFFFFE in current implementations).
+  - `internal_type_id` is one of `ENUM`, `STRUCT`, `COMPATIBLE_STRUCT`, `EXT`, 
or `TYPED_UNION`.
 - Named types do not embed a user ID. They use `NAMED_*` internal type IDs and 
carry a namespace
   and type name (or shared TypeDef) instead.
 
@@ -411,7 +416,7 @@ followed by optional type-specific metadata.
 
 After the type ID:
 
-- **ENUM / STRUCT / EXT / TYPED_UNION**: no extra bytes (registration by ID 
required on both sides).
+- **ENUM / STRUCT / EXT / TYPED_UNION**: no extra bytes beyond the 
`user_type_id` (registration by ID required on both sides).
 - **COMPATIBLE_STRUCT**:
   - If meta share is enabled, write a shared TypeDef entry (see below).
   - If meta share is disabled, no extra bytes.
@@ -912,6 +917,24 @@ else:
 
 Note: TAGGED_INT64 uses 30 bits + sign for values [-2^30, 2^30-1], while 
TAGGED_UINT64 uses full 31 bits for unsigned values [0, 2^31-1].
 
+#### float8
+
+- size: 1 byte
+- format:
+  - float8 has 4 kinds: float8 kind enum: float8_e4m3fn, float8_e4m3fnuz, 
float8_e5m2, float8_e5m2fnuz
+  - when serialize as field, write raw 8 bits as one byte directly
+  - when serialize as an object: write type kind as a byte, then write value 
byte
+
+#### float16
+
+- size: 2 bytes
+- format: encode the specified floating-point value according to the IEEE 754 
standard binary16 format, preserving NaN values, then write as binary by little 
endian order.
+
+#### bfloat16
+
+- size: 2 bytes
+- format: encode the specified floating-point value according to the IEEE 754 
standard bfloat16 format, preserving NaN values, then write as binary by little 
endian order.
+
 #### float32
 
 - size: 4 byte
@@ -1112,6 +1135,11 @@ then copy the whole buffer into the stream.
 Such serialization won't compress the array. If users want to compress 
primitive array, users need to register custom
 serializers for such types or mark it as list type.
 
+Float array specifics:
+
+- float16/bfloat16 array: write `varuint` length, then raw bytes in little 
endian order.
+- float8 array: write element type kind as a byte, then `varuint` length, then 
raw bytes in little endian order.
+
 #### Multi-dimensional arrays
 
 Xlang does not define a dedicated tensor encoding. Multi-dimensional arrays 
are serialized as
@@ -1342,15 +1370,15 @@ Rules:
 
 | Type ID | Name        | Meaning                                              
|
 | ------: | ----------- | ---------------------------------------------------- 
|
-|      31 | UNION       | Union value, schema identity not embedded            
|
-|      32 | TYPED_UNION | Union value with embedded registered numeric type ID 
|
-|      33 | NAMED_UNION | Union value with embedded type name / shared TypeDef 
|
+|      33 | UNION       | Union value, schema identity not embedded            
|
+|      34 | TYPED_UNION | Union value with registered numeric type ID          
|
+|      35 | NAMED_UNION | Union value with embedded type name / shared TypeDef 
|
 
 Type meta encoding:
 
-- `UNION (31)`: no additional type meta payload.
-- `TYPED_UNION (32)`: no additional type meta payload (numeric ID is carried 
in the full type ID itself).
-- `NAMED_UNION (33)`: followed by named type meta (namespace + type name, or 
shared TypeDef marker/body).
+- `UNION (33)`: no additional type meta payload.
+- `TYPED_UNION (34)`: write `user_type_id` as varuint32 after the type ID.
+- `NAMED_UNION (35)`: followed by named type meta (namespace + type name, or 
shared TypeDef marker/body).
 
 #### Union value payload
 
@@ -1375,27 +1403,27 @@ This is required even for primitives so unknown 
alternatives can be skipped safe
 **UNION (schema known from context)**
 
 ```
-| ... outer ref meta ... | type_id=UNION(31) | case_id | case_value |
+| ... outer ref meta ... | type_id=UNION(33) | case_id | case_value |
 ```
 
-**TYPED_UNION (schema embedded by numeric id)**
+**TYPED_UNION (schema identified by numeric id)**
 
 ```
-| ... outer ref meta ... | embedded type id | case_id | case_value |
+| ... outer ref meta ... | type_id=TYPED_UNION(34) | user_type_id | case_id | 
case_value |
 ```
 
-embedded type id: `type_id=(user_type_id << 8) | TYPED_UNION(32)`
+user_type_id: varuint32 numeric registration ID for the union schema.
 
 **NAMED_UNION (schema embedded by name/typedef)**
 
 ```
-| ... outer ref meta ... | type_id=NAMED_UNION(33) | name_or_typedef | case_id 
| case_value |
+| ... outer ref meta ... | type_id=NAMED_UNION(35) | name_or_typedef | case_id 
| case_value |
 ```
 
 #### Decoding rules
 
 1. Read outer ref meta and `type_id`.
-2. If `TYPED_UNION`, resolve the union schema from the full type ID.
+2. If `TYPED_UNION`, read `user_type_id` and resolve the union schema by ID.
 3. If `NAMED_UNION`, read named type meta and resolve the union schema.
 4. Read `case_id`.
 5. Read `case_value` as Any-style value (ref meta + type meta + value).
@@ -1581,7 +1609,7 @@ Meta strings are required for enum and struct 
serialization (encoding field name
     - [ ] Support registration by numeric ID
     - [ ] Support registration by namespace + type name
     - [ ] Maintain type → serializer mapping
-    - [ ] Generate type IDs: `(user_id << 8) | internal_type_id`
+    - [ ] Generate type IDs: write internal type ID, then `user_type_id` as 
varuint32
 
 14. **Field Ordering**
     - [ ] Implement the spec-defined grouping and ordering 
(primitive/boxed/built-in, collections/maps, other)
diff --git a/docs/specification/xlang_type_mapping.md 
b/docs/specification/xlang_type_mapping.md
index 9390e56d6d..533661b3f2 100644
--- a/docs/specification/xlang_type_mapping.md
+++ b/docs/specification/xlang_type_mapping.md
@@ -25,6 +25,27 @@ Note:
 - `int16_t[n]/vector<T>` indicates `int16_t[n]/vector<int16_t>`
 - The cross-language serialization is not stable, do not use it in your 
production environment.
 
+## User Type IDs
+
+When registering user types (struct/ext/enum/union), the internal type ID is 
written as the 8-bit
+kind, and the user type ID is written separately as an unsigned varint32. 
There is no bit
+shift/packing, and `user_type_id` can be in the range `0~0xFFFFFFFE`.
+
+**Examples:**
+
+| User ID | Type              | Internal ID | Encoded User ID | Decimal |
+| ------- | ----------------- | ----------- | --------------- | ------- |
+| 0       | STRUCT            | 27          | 0               | 0       |
+| 0       | ENUM              | 25          | 0               | 0       |
+| 1       | STRUCT            | 27          | 1               | 1       |
+| 1       | COMPATIBLE_STRUCT | 28          | 1               | 1       |
+| 2       | NAMED_STRUCT      | 29          | 2               | 2       |
+
+When reading type IDs:
+
+- Read internal type ID from the type ID field.
+- If the internal type is a user-registered kind, read `user_type_id` as 
varuint32.
+
 ## Type Mapping
 
 | Fory Type               | Fory Type ID | Java            | Python            
       | Javascript          | C++                            | Golang          
 | Rust              |
@@ -44,41 +65,45 @@ Note:
 | uint64                  | 13           | long/Long       | 
int/pyfory.fixed_uint64  | Type.uint64()       | uint64_t                       
| uint64           | u64               |
 | var_uint64              | 14           | long/Long       | int/pyfory.uint64 
       | Type.varUInt64()    | uint64_t                       | uint64          
 | u64               |
 | tagged_uint64           | 15           | long/Long       | 
int/pyfory.tagged_uint64 | Type.taggedUInt64() | uint64_t                       
| uint64           | u64               |
-| float16                 | 16           | float/Float     | 
float/pyfory.float16     | Type.float16()      | fory::float16_t                
| fory.float16     | fory::f16         |
-| float32                 | 17           | float/Float     | 
float/pyfory.float32     | Type.float32()      | float                          
| float32          | f32               |
-| float64                 | 18           | double/Double   | 
float/pyfory.float64     | Type.float64()      | double                         
| float64          | f64               |
-| string                  | 19           | String          | str               
       | String              | string                         | string          
 | String/str        |
-| list                    | 20           | List/Collection | list/tuple        
       | array               | vector                         | slice           
 | Vec               |
-| set                     | 21           | Set             | set               
       | /                   | set                            | fory.Set        
 | Set               |
-| map                     | 22           | Map             | dict              
       | Map                 | unordered_map                  | map             
 | HashMap           |
-| enum                    | 23           | Enum subclasses | enum subclasses   
       | /                   | enum                           | /               
 | enum              |
-| named_enum              | 24           | Enum subclasses | enum subclasses   
       | /                   | enum                           | /               
 | enum              |
-| struct                  | 25           | pojo/record     | data class        
       | object              | struct/class                   | struct          
 | struct            |
-| compatible_struct       | 26           | pojo/record     | data class        
       | object              | struct/class                   | struct          
 | struct            |
-| named_struct            | 27           | pojo/record     | data class        
       | object              | struct/class                   | struct          
 | struct            |
-| named_compatible_struct | 28           | pojo/record     | data class        
       | object              | struct/class                   | struct          
 | struct            |
-| ext                     | 29           | pojo/record     | data class        
       | object              | struct/class                   | struct          
 | struct            |
-| named_ext               | 30           | pojo/record     | data class        
       | object              | struct/class                   | struct          
 | struct            |
-| union                   | 31           | Union           | typing.Union      
       | /                   | `std::variant<Ts...>`          | /               
 | tagged union enum |
-| none                    | 32           | null            | None              
       | null                | `std::monostate`               | nil             
 | `()`              |
-| duration                | 33           | Duration        | timedelta         
       | Number              | duration                       | Duration        
 | Duration          |
-| timestamp               | 34           | Instant         | datetime          
       | Number              | std::chrono::nanoseconds       | Time            
 | DateTime          |
-| date                    | 35           | Date            | datetime          
       | Number              | fory::serialization::Date      | Time            
 | DateTime          |
-| decimal                 | 36           | BigDecimal      | Decimal           
       | bigint              | /                              | /               
 | /                 |
-| binary                  | 37           | byte[]          | bytes             
       | /                   | `uint8_t[n]/vector<T>`         | `[n]uint8/[]T`  
 | `Vec<uint8_t>`    |
-| array                   | 38           | array           | np.ndarray        
       | /                   | /                              | array/slice     
 | Vec               |
-| bool_array              | 39           | bool[]          | 
ndarray(np.bool\_)       | /                   | `bool[n]`                      
| `[n]bool/[]T`    | `Vec<bool>`       |
-| int8_array              | 40           | byte[]          | ndarray(int8)     
       | /                   | `int8_t[n]/vector<T>`          | `[n]int8/[]T`   
 | `Vec<i8>`         |
-| int16_array             | 41           | short[]         | ndarray(int16)    
       | /                   | `int16_t[n]/vector<T>`         | `[n]int16/[]T`  
 | `Vec<i16>`        |
-| int32_array             | 42           | int[]           | ndarray(int32)    
       | /                   | `int32_t[n]/vector<T>`         | `[n]int32/[]T`  
 | `Vec<i32>`        |
-| int64_array             | 43           | long[]          | ndarray(int64)    
       | /                   | `int64_t[n]/vector<T>`         | `[n]int64/[]T`  
 | `Vec<i64>`        |
-| uint8_array             | 44           | short[]         | ndarray(uint8)    
       | /                   | `uint8_t[n]/vector<T>`         | `[n]uint8/[]T`  
 | `Vec<u8>`         |
-| uint16_array            | 45           | int[]           | ndarray(uint16)   
       | /                   | `uint16_t[n]/vector<T>`        | `[n]uint16/[]T` 
 | `Vec<u16>`        |
-| uint32_array            | 46           | long[]          | ndarray(uint32)   
       | /                   | `uint32_t[n]/vector<T>`        | `[n]uint32/[]T` 
 | `Vec<u32>`        |
-| uint64_array            | 47           | long[]          | ndarray(uint64)   
       | /                   | `uint64_t[n]/vector<T>`        | `[n]uint64/[]T` 
 | `Vec<u64>`        |
-| float16_array           | 48           | float[]         | ndarray(float16)  
       | /                   | `fory::float16_t[n]/vector<T>` | 
`[n]float16/[]T` | `Vec<fory::f16>`  |
-| float32_array           | 49           | float[]         | ndarray(float32)  
       | /                   | `float[n]/vector<T>`           | 
`[n]float32/[]T` | `Vec<f32>`        |
-| float64_array           | 50           | double[]        | ndarray(float64)  
       | /                   | `double[n]/vector<T>`          | 
`[n]float64/[]T` | `Vec<f64>`        |
+| float8                  | 16           | /               | /                 
       | /                   | /                              | /               
 | /                 |
+| float16                 | 17           | float/Float     | 
float/pyfory.float16     | Type.float16()      | fory::float16_t                
| fory.float16     | fory::f16         |
+| bfloat16                | 18           | /               | /                 
       | /                   | /                              | /               
 | /                 |
+| float32                 | 19           | float/Float     | 
float/pyfory.float32     | Type.float32()      | float                          
| float32          | f32               |
+| float64                 | 20           | double/Double   | 
float/pyfory.float64     | Type.float64()      | double                         
| float64          | f64               |
+| string                  | 21           | String          | str               
       | String              | string                         | string          
 | String/str        |
+| list                    | 22           | List/Collection | list/tuple        
       | array               | vector                         | slice           
 | Vec               |
+| set                     | 23           | Set             | set               
       | /                   | set                            | fory.Set        
 | Set               |
+| map                     | 24           | Map             | dict              
       | Map                 | unordered_map                  | map             
 | HashMap           |
+| enum                    | 25           | Enum subclasses | enum subclasses   
       | /                   | enum                           | /               
 | enum              |
+| named_enum              | 26           | Enum subclasses | enum subclasses   
       | /                   | enum                           | /               
 | enum              |
+| struct                  | 27           | pojo/record     | data class        
       | object              | struct/class                   | struct          
 | struct            |
+| compatible_struct       | 28           | pojo/record     | data class        
       | object              | struct/class                   | struct          
 | struct            |
+| named_struct            | 29           | pojo/record     | data class        
       | object              | struct/class                   | struct          
 | struct            |
+| named_compatible_struct | 30           | pojo/record     | data class        
       | object              | struct/class                   | struct          
 | struct            |
+| ext                     | 31           | pojo/record     | data class        
       | object              | struct/class                   | struct          
 | struct            |
+| named_ext               | 32           | pojo/record     | data class        
       | object              | struct/class                   | struct          
 | struct            |
+| union                   | 33           | Union           | typing.Union      
       | /                   | `std::variant<Ts...>`          | /               
 | tagged union enum |
+| none                    | 36           | null            | None              
       | null                | `std::monostate`               | nil             
 | `()`              |
+| duration                | 37           | Duration        | timedelta         
       | Number              | duration                       | Duration        
 | Duration          |
+| timestamp               | 38           | Instant         | datetime          
       | Number              | std::chrono::nanoseconds       | Time            
 | DateTime          |
+| date                    | 39           | Date            | datetime          
       | Number              | fory::serialization::Date      | Time            
 | DateTime          |
+| decimal                 | 40           | BigDecimal      | Decimal           
       | bigint              | /                              | /               
 | /                 |
+| binary                  | 41           | byte[]          | bytes             
       | /                   | `uint8_t[n]/vector<T>`         | `[n]uint8/[]T`  
 | `Vec<uint8_t>`    |
+| array                   | 42           | array           | np.ndarray        
       | /                   | /                              | array/slice     
 | Vec               |
+| bool_array              | 43           | bool[]          | 
ndarray(np.bool\_)       | /                   | `bool[n]`                      
| `[n]bool/[]T`    | `Vec<bool>`       |
+| int8_array              | 44           | byte[]          | ndarray(int8)     
       | /                   | `int8_t[n]/vector<T>`          | `[n]int8/[]T`   
 | `Vec<i8>`         |
+| int16_array             | 45           | short[]         | ndarray(int16)    
       | /                   | `int16_t[n]/vector<T>`         | `[n]int16/[]T`  
 | `Vec<i16>`        |
+| int32_array             | 46           | int[]           | ndarray(int32)    
       | /                   | `int32_t[n]/vector<T>`         | `[n]int32/[]T`  
 | `Vec<i32>`        |
+| int64_array             | 47           | long[]          | ndarray(int64)    
       | /                   | `int64_t[n]/vector<T>`         | `[n]int64/[]T`  
 | `Vec<i64>`        |
+| uint8_array             | 48           | short[]         | ndarray(uint8)    
       | /                   | `uint8_t[n]/vector<T>`         | `[n]uint8/[]T`  
 | `Vec<u8>`         |
+| uint16_array            | 49           | int[]           | ndarray(uint16)   
       | /                   | `uint16_t[n]/vector<T>`        | `[n]uint16/[]T` 
 | `Vec<u16>`        |
+| uint32_array            | 50           | long[]          | ndarray(uint32)   
       | /                   | `uint32_t[n]/vector<T>`        | `[n]uint32/[]T` 
 | `Vec<u32>`        |
+| uint64_array            | 51           | long[]          | ndarray(uint64)   
       | /                   | `uint64_t[n]/vector<T>`        | `[n]uint64/[]T` 
 | `Vec<u64>`        |
+| float8_array            | 52           | /               | /                 
       | /                   | /                              | /               
 | /                 |
+| float16_array           | 53           | float[]         | ndarray(float16)  
       | /                   | `fory::float16_t[n]/vector<T>` | 
`[n]float16/[]T` | `Vec<fory::f16>`  |
+| bfloat16_array          | 54           | /               | /                 
       | /                   | /                              | /               
 | /                 |
+| float32_array           | 55           | float[]         | ndarray(float32)  
       | /                   | `float[n]/vector<T>`           | 
`[n]float32/[]T` | `Vec<f32>`        |
+| float64_array           | 56           | double[]        | ndarray(float64)  
       | /                   | `double[n]/vector<T>`          | 
`[n]float64/[]T` | `Vec<f64>`        |
 
 ## Type info(not implemented currently)
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(fory-site) 02/02: 🔄 synced local 'docs/specification/' with remote 'docs/specification/'

Reply via email to