chaokunyang opened a new issue, #3285: URL: https://github.com/apache/fory/issues/3285
### Feature Request Add **bfloat16** and **bfloat16_array** support to Fory Rust runtime and codegen, following the same structure as the existing float16 work. Related float16 issue: #3207 (open) ### Is your feature request related to a problem? Please describe We want to use `bfloat16` (BF16) in FDL to reduce payload size while keeping a wide exponent range (common in ML/AI workflows). Fory currently lacks a BF16 primitive and optimized arrays. ### Describe the solution you'd like #### 1) FDL / Type System - Introduce a new primitive type: `bfloat16`. - Add `bfloat16_array` as a packed array type (or treat it as a canonical optimized form of `repeated bfloat16`). - Allow `bfloat16` in message fields, repeated fields, map values, and unions (where primitives are allowed). #### 2) Wire Format / Serialization Semantics - Encode `bfloat16` as **2 bytes** representing the raw IEEE 754 bfloat16 bit pattern. - Endianness must match existing float32/float64 behavior. - NaN/Inf/±0/subnormal must round-trip correctly at the bit level (document NaN policy if canonicalized). #### 3) Rust Runtime (core requirement) Provide a public strong type and conversions, not raw `uint16` in public APIs. ##### 3.1 Type definition - Type: `BFloat16` - Suggested representation: `#[repr(transparent)] struct BFloat16(u16);` - Provide `to_bits()` / `from_bits()` equivalents. ##### 3.2 Conversions (IEEE 754 compliant) - Convert float32 -> bfloat16 with **round-to-nearest, ties-to-even**. - Convert bfloat16 -> float32 exactly. - Correct handling of NaN/Inf/±0/subnormals. ##### 3.3 Arrays / `bfloat16_array` - Provide `BFloat16Array` as a packed contiguous array of u16 bits. - Suggested storage: BFloat16Array(Vec<u16>) or Vec<BFloat16> with packed storage helpers - Mapping: `bfloat16_array` -> BFloat16Array, `repeated bfloat16` -> Vec<BFloat16> ##### 3.4 Serialization boundary - Serializer/deserializer should read/write the underlying 16-bit pattern. - Public APIs should not expose raw `uint16` for values. #### 4) Codegen requirement (Rust) - Generated fields for `bfloat16` must use `BFloat16`. - Generated fields for `bfloat16_array` must use `BFloat16Array` or the packed container noted above. #### 5) Compiler/Reflection integration - Update type resolver/reflection to treat `BFloat16` as the `bfloat16` primitive (distinct from `uint16`). #### 6) Tests - Scalar conversions (±0, ±Inf, NaN, subnormals, max/min normal). - Array serialization tests for packed `bfloat16_array`. - Round-trip tests inside messages (including repeated/map use). ### Describe alternatives you've considered 1) Store BF16 as float32 in memory and downcast only on serialization. - Rejected: increases memory footprint and defers rounding, causing cross-language inconsistencies. 2) Expose raw `uint16` in public APIs. - Rejected: error-prone and inconsistent with strong-typed primitives. ### Additional context This issue mirrors the structure of the float16 work but targets bfloat16 + bfloat16_array. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
