chaokunyang commented on issue #3191:
URL: https://github.com/apache/fory/issues/3191#issuecomment-3783327607

   Another proposal, which is not taken by me:
   
   
   
   This issue specifies how **Fory IDL unions** are represented in the **Fory 
xlang** wire format, and how protobuf `oneof`
   and FlatBuffers `union` can be mapped to Fory unions.
   
   Design goals:
   
   - Support **schema evolution / unknown union alternatives** (skip unknown 
cases safely).
   - Support dynamic `Any` carrying a union value and restoring the **exact 
union type**.
   - Reuse existing Fory value encoding (ref meta + standard type meta + value 
bytes) for union case values.
   - Keep a **single internal type id** for union (`Types.UNION = 31`) and 
encode “typed union” as a variant in union payload.
   
   ---
   
   ## 1. IDL Syntax
   
   ### 1.1 Union definition
   
   ```fdl
   union Contact [id=0] {
     string email = 1;
     int32  phone = 2;
   }
   ```
   
   Rules:
   
   - Each union alternative MUST have a **tag number** (`= 1`, `= 2`, ...).
   - Tag numbers MUST be unique within the union.
   - Tag numbers SHOULD follow protobuf evolution rules: do not reuse removed 
tag numbers.
   
   ### 1.2 Union usage
   
   ```fdl
   message Person [id=1] {
     Contact contact = 1;
   }
   ```
   
   ---
   
   ## 2. Mapping from Other IDLs
   
   ### 2.1 Protobuf `oneof` → Fory `union`
   
   Protobuf:
   
   ```proto
   message Person {
     oneof contact {
       string email = 1;
       int32  phone = 2;
     }
   }
   ```
   
   Mapping:
   
   - The `oneof` group maps to a Fory `union`.
   - Each `oneof` field number becomes the union alternative **tag number** 
(case `field_id`).
   - The `oneof` value is encoded using the union encoding defined in this 
document.
   
   ### 2.2 FlatBuffers `union` → Fory `union`
   
   FlatBuffers:
   
   ```fbs
   union Equipment { Weapon, Monster }
   ```
   
   Mapping example:
   
   ```fdl
   union Equipment {
     Weapon  weapon  = 0;
     Monster monster = 1;
   }
   ```
   
   - FlatBuffers union discriminator values map to union alternative tag 
numbers.
   
   ---
   
   ## 3. Wire Format Overview
   
   A union value is encoded as a normal Fory value:
   
   1. **Reference meta** (NULL/REF/NOT_NULL/REF_VALUE flags) is handled at the 
outer object level as usual.
   2. **Type meta** contains internal type id `UNION (31)` and optional user 
type id/name resolution as per standard rules.
   3. The **union payload** (defined below) follows.
   
   This spec defines only the **union payload** layout.
   
   ---
   
   ## 4. Union Payload Encoding
   
   ### 4.1 Key requirement: case value is encoded as `Any`
   
   To ensure a decoder can always skip unknown union alternatives, the union 
case value MUST be encoded as a **full Fory value**
   (i.e., the same encoding as if the case value were an `Any`):
   
   ```text
   | field_ref_meta | field_value_type_meta | field_value_bytes |
   ```
   
   This guarantees that an implementation which does not recognize `field_id` 
can still skip the case value by reading
   `field_value_type_meta` and invoking standard `skipValue(type_id)` logic.
   
   ### 4.2 Union header (discriminator + payload kind)
   
   The union payload starts with a **union header**, encoded as `varuint32`:
   
   ```text
   union_header = (field_id << 2) | kind
   ```
   
   - `field_id` is the union alternative tag number (from FDL/proto).
   - `kind` occupies the low 2 bits:
   
   | kind | Name          | Meaning                                             
                   |
   
|------|---------------|------------------------------------------------------------------------|
   | 0    | NORMAL        | Union type identity is NOT embedded (requires 
external declared type). |
   | 1    | TYPED_BY_ID   | Union embeds its registered numeric type id.        
                   |
   | 2    | TYPED_BY_NAME | Union embeds its registered type name / typedef 
reference.             |
   | 3    | RESERVED      | Reserved for future extension.                      
                   |
   
   ### 4.3 Union payload layouts
   
   #### 4.3.1 NORMAL union payload (`kind = 0`)
   
   ```text
   | union_header | field_ref_meta | field_value_type_meta | field_value |
   ```
   
   Use when the union schema/type is known from context, e.g.:
   
   - A struct field declared as `Contact contact`
   - A list/map element whose declared generic type is the union type
   - Any situation where the deserializer has a declared target union type
   
   #### 4.3.2 Union typed by registered numeric type id (`kind = 1`)
   
   ```text
   | union_header | union_type_id | field_ref_meta | field_value_type_meta | 
field_value |
   ```
   
   - `union_type_id` is encoded as `varuint32`, using the standard “Full Type 
ID” rule:
     - `Full Type ID = (user_type_id << 8) | internal_type_id`
     - Named types do not embed a user id; see `kind=2`.
   
   Use when the union schema/type is NOT known from context, most importantly 
when a union value is stored inside `Any`.
   
   #### 4.3.3 Union typed by name / shared typedef (`kind = 2`)
   
   ```text
   | union_header | union_type_name_or_typedef | field_ref_meta | 
field_value_type_meta | field_value |
   ```
   
   - `union_type_name_or_typedef` MUST reuse the existing xlang **named type 
meta** mechanism:
     - If meta share is disabled: write `namespace` + `type_name` as meta 
strings.
     - If meta share is enabled: write a shared TypeDef marker and TypeDef body 
as defined by the xlang meta share rules.
   
   Use when union type identity must be carried by name (unregistered types or 
cross-process name-based resolution),
   especially for `Any` payloads.
   
   ---
   
   ## 5. Reference Meta and Value Type Meta
   
   ### 5.1 `field_ref_meta`
   
   `field_ref_meta` is encoded exactly like any other value in xlang:
   
   - NULL FLAG (`0xFD`): null case value, no further bytes for the case value
   - REF FLAG (`0xFE` + ref_id): shared reference
   - NOT_NULL VALUE FLAG (`0xFF`): non-null, no ref tracking
   - REF VALUE FLAG (`0x00`): first occurrence with ref tracking
   
   ### 5.2 `field_value_type_meta`
   
   `field_value_type_meta` is encoded exactly like normal xlang **Type Meta**:
   
   - `type_id` as `varuint32`
   - Optional meta payload depending on internal type id
     - e.g. `NAMED_STRUCT` uses name strings or shared TypeDef marker
   
   This is required even for primitives, because it enables safe skipping of 
unknown union alternatives.
   
   ---
   
   ## 6. Decoding Rules
   
   ### 6.1 Decoding algorithm (high level)
   
   1. Read `union_header` (`varuint32`).
   2. Compute:
      - `kind = union_header & 0x3`
      - `field_id = union_header >> 2`
   3. If `kind == 1`: read `union_type_id` and resolve the union schema type 
(for `Any` / dynamic contexts).
   4. If `kind == 2`: read `union_type_name_or_typedef` and resolve the union 
schema type.
   5. Read `field_ref_meta`.
      - If null/reference completes the value, stop.
   6. Read `field_value_type_meta` (standard xlang type meta).
   7. Deserialize the case value using `field_value_type_meta` and populate 
union result.
   
   ### 6.2 Unknown `field_id` handling
   
   If the decoder does not recognize `field_id` in the resolved union schema:
   
   - It MUST still consume the case value bytes by:
     1. reading `field_ref_meta`
     2. if non-null and not a ref, reading `field_value_type_meta`
     3. invoking standard `skipValue(type_id)` to skip `field_value`
   
   This provides forward compatibility for added union alternatives.
   
   ---
   
   ## 7. When to Use Each `kind`
   
   - Use `NORMAL (kind=0)` whenever the union schema type is known from the 
declared field type or decoding target type.
   - Use `TYPED_BY_ID (kind=1)` or `TYPED_BY_NAME (kind=2)` when union schema 
type is not known from context, especially:
     - union stored in `Any`
     - union stored in an `UNKNOWN` / fully polymorphic field without a 
declared union type
   
   Implementations MAY always use typed forms for simplicity, but `kind=0` is 
recommended for smaller payloads when context is available.
   
   ---
   
   ## 8. Compatibility and Evolution Notes
   
   - Union alternative tags (`field_id`) MUST be stable identifiers.
   - Adding a new alternative is forward compatible:
     - old readers can skip unknown `field_id` because case values are encoded 
as `Any` (with type meta).
   - Removing an alternative is backward compatible if:
     - the removed `field_id` is not reused
     - readers treat unknown alternatives as “present but ignored”
   
   ---
   
   ## 9. Summary
   
   - Keep **one internal type id** for union: `UNION (31)`.
   - Encode union discriminator as `union_header = (field_id << 2) | kind`.
   - Encode union case values exactly like `Any`: `ref_meta + type_meta + 
value`.
   - Support `Any` holding unions by optionally embedding **union type 
identity** in union payload (`kind=1/2`).
   - Use FDL/protobuf tag numbers as union case ids (`field_id`) for stable 
evolution.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to