[I] [Rust] Add #[fory()] field attributes for optimization metadata [fory]

via GitHub Thu, 04 Dec 2025 21:19:36 -0800


chaokunyang opened a new issue, #3004:
URL: https://github.com/apache/fory/issues/3004


   ## Feature Request
   
   Extend the `#[derive(ForyObject)]` macro to support `#[fory()]` field 
attributes for performance and space optimization during xlang serialization.
   
   ## Is your feature request related to a problem? Please describe
   
   Currently, Fory's Rust xlang serialization treats all struct fields 
uniformly:
   1. **Null checks are always performed** - Even for fields that are never 
null, Fory writes a null/ref flag (1 byte per field)
   2. **Reference tracking is always applied** (when enabled globally) - Even 
for fields that won't be shared/cyclic, objects are tracked with hash lookup 
cost
   3. **Field names use meta string encoding** - In schema evolution mode, 
field names are encoded using meta string compression, but for fields with long 
names, this still takes space
   
   These defaults ensure correctness but introduce unnecessary overhead when 
the developer has more specific knowledge about their data model.
   
   ## Describe the solution you'd like
   
   Extend the `#[fory()]` attribute to support field-level metadata:
   
   ```rust
   use fory::ForyObject;
   
   #[derive(ForyObject)]
   struct Foo {
       // Field f1: non-nullable (default), no ref tracking (default)
       // Tag ID 0 provides compact encoding in schema evolution mode
       #[fory(id = 0)]
       f1: String,
       
       // Field f2: non-nullable (default), no ref tracking (default)
       #[fory(id = 1)]
       f2: Bar,
       
       // Field f3: nullable field that may contain null values
       #[fory(id = 2, nullable)]
       f3: Option<String>,
       
       // Field f4: shared reference that needs tracking (e.g., for circular 
refs)
       #[fory(id = 3, ref, nullable)]
       parent: Option<Rc<Node>>,
       
       // Field with long name: tag ID provides significant space savings
       #[fory(id = 4)]
       very_long_field_name_that_would_take_many_bytes: String,
       
       // Explicit opt-out: use field name encoding but get nullable 
optimization
       #[fory(id = -1, nullable)]
       optional_field: Option<String>,
   }
   ```
   
   ### Attribute Syntax
   
   ```rust
   #[fory(
       id = <i32>,           // REQUIRED: Tag ID for field encoding
                             // >= 0: Use tag ID encoding
                             // -1: Use field name encoding (opt-out)
       
       nullable,             // Optional: Field can be None (default: false)
                             // Required for Option<T> types
       
       ref,                  // Optional: Track references (default: false)
                             // Useful for Rc<T>, Arc<T>, circular references
   )]
   ```
   
   ### Design Decision: Required `id`
   
   The `id` attribute is **required** when using `#[fory()]` on a field:
   - `id = 0` to `id = N`: Use tag ID encoding (compact)
   - `id = -1`: Explicit opt-out, use field name encoding
   
   Rationale:
   1. **Explicit control**: Using `#[fory()]` means opting into explicit control
   2. **Compile-time validation**: Proc macro can check for duplicate IDs
   3. **Proven pattern**: Similar to protobuf field numbers
   
   ### Optimization Details
   
   #### 1. Non-nullable (Default) Optimization
   
   When `nullable` is NOT specified:
   - Skip writing the null flag entirely (1 byte saved per field)
   - Directly serialize the field value
   - Compile error if field type is `Option<T>` without `nullable`
   
   #### 2. No Ref Tracking (Default) Optimization
   
   When `ref` is NOT specified:
   - Skip reference tracking map operations
   - Skip ref flag when combined with non-nullable
   - For `Rc<T>`/`Arc<T>`, consider adding `ref` if circular refs are possible
   
   #### 3. Tag ID Optimization
   
   When `id = N` where N >= 0:
   - Field name encoded as varint instead of meta string
   - Significant space savings for long field names
   
   **Space savings:**
   
   | Field Name | Meta String (approx) | Tag ID |
   |------------|---------------------|--------|
   | `f1` | ~2 bytes | 1 byte |
   | `user_name` | ~6 bytes | 1 byte |
   | `transaction_id` | ~10 bytes | 1 byte |
   
   ### Implementation Notes
   
   1. **Proc Macro Enhancement**:
      ```rust
      // In fory-derive/src/object.rs
      #[proc_macro_derive(ForyObject, attributes(fory))]
      pub fn derive_fory_object(input: TokenStream) -> TokenStream {
          // Parse #[fory(id = N, nullable, ref)] attributes
          // Generate optimized serialization code based on attributes
      }
      ```
   
   2. **Code Generation**:
      ```rust
      // Generated code for #[fory(id = 0)] (non-nullable, no ref)
      fn serialize_field_f1(&self, writer: &mut Writer) {
          // No null check, no ref tracking
          writer.write_string(&self.f1);
      }
      
      // Generated code for #[fory(id = 2, nullable)]
      fn serialize_field_f3(&self, writer: &mut Writer) {
          match &self.f3 {
              Some(v) => {
                  writer.write_not_null();
                  writer.write_string(v);
              }
              None => writer.write_null(),
          }
      }
      ```
   
   3. **Compile-time Validation**:
      - Error if duplicate tag IDs (>= 0) in same struct
      - Error if `id < -1`
      - Error if `Option<T>` field without `nullable`
      - Warning if `Rc<T>`/`Arc<T>` without `ref` (potential circular ref 
issues)
   
   4. **Runtime Validation**:
      - Panic if non-nullable field serialized with None value (shouldn't 
happen in Rust)
   
   ### Example: Generated Code
   
   ```rust
   #[derive(ForyObject)]
   struct Foo {
       #[fory(id = 0)]
       name: String,
       
       #[fory(id = 1, nullable)]
       nickname: Option<String>,
   }
   
   // Generates approximately:
   impl ForySerialize for Foo {
       fn serialize(&self, writer: &mut Writer) -> Result<()> {
           // Field: name (id=0, non-nullable, no ref)
           writer.write_tag_id(0);
           writer.write_string(&self.name)?;
           
           // Field: nickname (id=1, nullable, no ref)
           writer.write_tag_id(1);
           match &self.nickname {
               Some(v) => {
                   writer.write_byte(NOT_NULL_FLAG);
                   writer.write_string(v)?;
               }
               None => writer.write_byte(NULL_FLAG),
           }
           
           Ok(())
       }
   }
   ```
   
   ### Performance Impact
   
   For a struct with 10 fields using default settings (non-nullable, no ref 
tracking):
   - **Space savings**: ~20 bytes per object (null + ref flags)
   - **CPU savings**: 10 fewer hash map operations per serialization
   - **Zero runtime overhead** for metadata (all compile-time via proc macro)
   
   ## Additional context
   
   This is the Rust equivalent of Java's `@ForyField` annotation. See [Java 
issue #3000](https://github.com/apache/fory/issues/3000) for the original 
design discussion.
   
   Protocol spec: 
https://fory.apache.org/docs/specification/fory_xlang_serialization_spec


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [Rust] Add #[fory()] field attributes for optimization metadata [fory]

Reply via email to