chaokunyang commented on issue #3191:
URL: https://github.com/apache/fory/issues/3191#issuecomment-3783327607
Another proposal, which is not taken by me:
This issue specifies how **Fory IDL unions** are represented in the **Fory
xlang** wire format, and how protobuf `oneof`
and FlatBuffers `union` can be mapped to Fory unions.
Design goals:
- Support **schema evolution / unknown union alternatives** (skip unknown
cases safely).
- Support dynamic `Any` carrying a union value and restoring the **exact
union type**.
- Reuse existing Fory value encoding (ref meta + standard type meta + value
bytes) for union case values.
- Keep a **single internal type id** for union (`Types.UNION = 31`) and
encode “typed union” as a variant in union payload.
---
## 1. IDL Syntax
### 1.1 Union definition
```fdl
union Contact [id=0] {
string email = 1;
int32 phone = 2;
}
```
Rules:
- Each union alternative MUST have a **tag number** (`= 1`, `= 2`, ...).
- Tag numbers MUST be unique within the union.
- Tag numbers SHOULD follow protobuf evolution rules: do not reuse removed
tag numbers.
### 1.2 Union usage
```fdl
message Person [id=1] {
Contact contact = 1;
}
```
---
## 2. Mapping from Other IDLs
### 2.1 Protobuf `oneof` → Fory `union`
Protobuf:
```proto
message Person {
oneof contact {
string email = 1;
int32 phone = 2;
}
}
```
Mapping:
- The `oneof` group maps to a Fory `union`.
- Each `oneof` field number becomes the union alternative **tag number**
(case `field_id`).
- The `oneof` value is encoded using the union encoding defined in this
document.
### 2.2 FlatBuffers `union` → Fory `union`
FlatBuffers:
```fbs
union Equipment { Weapon, Monster }
```
Mapping example:
```fdl
union Equipment {
Weapon weapon = 0;
Monster monster = 1;
}
```
- FlatBuffers union discriminator values map to union alternative tag
numbers.
---
## 3. Wire Format Overview
A union value is encoded as a normal Fory value:
1. **Reference meta** (NULL/REF/NOT_NULL/REF_VALUE flags) is handled at the
outer object level as usual.
2. **Type meta** contains internal type id `UNION (31)` and optional user
type id/name resolution as per standard rules.
3. The **union payload** (defined below) follows.
This spec defines only the **union payload** layout.
---
## 4. Union Payload Encoding
### 4.1 Key requirement: case value is encoded as `Any`
To ensure a decoder can always skip unknown union alternatives, the union
case value MUST be encoded as a **full Fory value**
(i.e., the same encoding as if the case value were an `Any`):
```text
| field_ref_meta | field_value_type_meta | field_value_bytes |
```
This guarantees that an implementation which does not recognize `field_id`
can still skip the case value by reading
`field_value_type_meta` and invoking standard `skipValue(type_id)` logic.
### 4.2 Union header (discriminator + payload kind)
The union payload starts with a **union header**, encoded as `varuint32`:
```text
union_header = (field_id << 2) | kind
```
- `field_id` is the union alternative tag number (from FDL/proto).
- `kind` occupies the low 2 bits:
| kind | Name | Meaning
|
|------|---------------|------------------------------------------------------------------------|
| 0 | NORMAL | Union type identity is NOT embedded (requires
external declared type). |
| 1 | TYPED_BY_ID | Union embeds its registered numeric type id.
|
| 2 | TYPED_BY_NAME | Union embeds its registered type name / typedef
reference. |
| 3 | RESERVED | Reserved for future extension.
|
### 4.3 Union payload layouts
#### 4.3.1 NORMAL union payload (`kind = 0`)
```text
| union_header | field_ref_meta | field_value_type_meta | field_value |
```
Use when the union schema/type is known from context, e.g.:
- A struct field declared as `Contact contact`
- A list/map element whose declared generic type is the union type
- Any situation where the deserializer has a declared target union type
#### 4.3.2 Union typed by registered numeric type id (`kind = 1`)
```text
| union_header | union_type_id | field_ref_meta | field_value_type_meta |
field_value |
```
- `union_type_id` is encoded as `varuint32`, using the standard “Full Type
ID” rule:
- `Full Type ID = (user_type_id << 8) | internal_type_id`
- Named types do not embed a user id; see `kind=2`.
Use when the union schema/type is NOT known from context, most importantly
when a union value is stored inside `Any`.
#### 4.3.3 Union typed by name / shared typedef (`kind = 2`)
```text
| union_header | union_type_name_or_typedef | field_ref_meta |
field_value_type_meta | field_value |
```
- `union_type_name_or_typedef` MUST reuse the existing xlang **named type
meta** mechanism:
- If meta share is disabled: write `namespace` + `type_name` as meta
strings.
- If meta share is enabled: write a shared TypeDef marker and TypeDef body
as defined by the xlang meta share rules.
Use when union type identity must be carried by name (unregistered types or
cross-process name-based resolution),
especially for `Any` payloads.
---
## 5. Reference Meta and Value Type Meta
### 5.1 `field_ref_meta`
`field_ref_meta` is encoded exactly like any other value in xlang:
- NULL FLAG (`0xFD`): null case value, no further bytes for the case value
- REF FLAG (`0xFE` + ref_id): shared reference
- NOT_NULL VALUE FLAG (`0xFF`): non-null, no ref tracking
- REF VALUE FLAG (`0x00`): first occurrence with ref tracking
### 5.2 `field_value_type_meta`
`field_value_type_meta` is encoded exactly like normal xlang **Type Meta**:
- `type_id` as `varuint32`
- Optional meta payload depending on internal type id
- e.g. `NAMED_STRUCT` uses name strings or shared TypeDef marker
This is required even for primitives, because it enables safe skipping of
unknown union alternatives.
---
## 6. Decoding Rules
### 6.1 Decoding algorithm (high level)
1. Read `union_header` (`varuint32`).
2. Compute:
- `kind = union_header & 0x3`
- `field_id = union_header >> 2`
3. If `kind == 1`: read `union_type_id` and resolve the union schema type
(for `Any` / dynamic contexts).
4. If `kind == 2`: read `union_type_name_or_typedef` and resolve the union
schema type.
5. Read `field_ref_meta`.
- If null/reference completes the value, stop.
6. Read `field_value_type_meta` (standard xlang type meta).
7. Deserialize the case value using `field_value_type_meta` and populate
union result.
### 6.2 Unknown `field_id` handling
If the decoder does not recognize `field_id` in the resolved union schema:
- It MUST still consume the case value bytes by:
1. reading `field_ref_meta`
2. if non-null and not a ref, reading `field_value_type_meta`
3. invoking standard `skipValue(type_id)` to skip `field_value`
This provides forward compatibility for added union alternatives.
---
## 7. When to Use Each `kind`
- Use `NORMAL (kind=0)` whenever the union schema type is known from the
declared field type or decoding target type.
- Use `TYPED_BY_ID (kind=1)` or `TYPED_BY_NAME (kind=2)` when union schema
type is not known from context, especially:
- union stored in `Any`
- union stored in an `UNKNOWN` / fully polymorphic field without a
declared union type
Implementations MAY always use typed forms for simplicity, but `kind=0` is
recommended for smaller payloads when context is available.
---
## 8. Compatibility and Evolution Notes
- Union alternative tags (`field_id`) MUST be stable identifiers.
- Adding a new alternative is forward compatible:
- old readers can skip unknown `field_id` because case values are encoded
as `Any` (with type meta).
- Removing an alternative is backward compatible if:
- the removed `field_id` is not reused
- readers treat unknown alternatives as “present but ignored”
---
## 9. Summary
- Keep **one internal type id** for union: `UNION (31)`.
- Encode union discriminator as `union_header = (field_id << 2) | kind`.
- Encode union case values exactly like `Any`: `ref_meta + type_meta +
value`.
- Support `Any` holding unions by optionally embedding **union type
identity** in union payload (`kind=1/2`).
- Use FDL/protobuf tag numbers as union case ids (`field_id`) for stable
evolution.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]