chaokunyang opened a new issue, #3003:
URL: https://github.com/apache/fory/issues/3003

   ## Feature Request
   
   Create a `fory::field<>` template class for field metadata to enable 
performance and space optimization during xlang serialization.
   
   ## Is your feature request related to a problem? Please describe
   
   Currently, Fory's C++ xlang serialization treats all struct fields uniformly:
   1. **Null checks are always performed** - Even for fields that are never 
null, Fory writes a null/ref flag (1 byte per field)
   2. **Reference tracking is always applied** (when enabled globally) - Even 
for fields that won't be shared/cyclic, objects are tracked with hash lookup 
cost
   3. **Field names use meta string encoding** - In schema evolution mode, 
field names are encoded using meta string compression, but for fields with long 
names, this still takes space
   
   These defaults ensure correctness but introduce unnecessary overhead when 
the developer has more specific knowledge about their data model.
   
   ## Describe the solution you'd like
   
   Add a `fory::field<>` template in `field.h` that wraps field types with 
compile-time metadata:
   
   ```cpp
   #include <fory/serialization/field.h>
   #include <string>
   #include <memory>
   
   struct Foo {
       // Field f1: non-nullable (default), no ref tracking (default)
       // Tag ID 0 provides compact encoding in schema evolution mode
       fory::field<std::string, fory::id<0>> f1;
       
       // Field f2: non-nullable (default), no ref tracking (default)
       fory::field<Bar, fory::id<1>> f2;
       
       // Field f3: nullable field that may contain null values
       fory::field<std::optional<std::string>, fory::id<2>, fory::nullable> f3;
       
       // Field f4: shared reference that needs tracking (e.g., for circular 
refs)
       fory::field<std::shared_ptr<Node>, fory::id<3>, fory::ref, 
fory::nullable> parent;
       
       // Field with long name: tag ID provides significant space savings
       fory::field<std::string, fory::id<4>> 
very_long_field_name_that_would_take_many_bytes;
       
       // Explicit opt-out: use field name encoding but get nullable 
optimization
       fory::field<std::optional<std::string>, fory::id<-1>, fory::nullable> 
optional_field;
   };
   
   // Register with Fory
   FORY_REGISTER_TYPE(Foo);
   ```
   
   ### Template API Design
   
   ```cpp
   namespace fory {
   
   // Tag types for field properties
   template<int N>
   struct id { static constexpr int value = N; };
   
   struct nullable { static constexpr bool value = true; };
   struct ref { static constexpr bool value = true; };
   
   // Field wrapper template
   template<typename T, typename... Props>
   class field {
   public:
       using value_type = T;
       
       // Compile-time property extraction
       static constexpr int tag_id = /* extract from Props... */;
       static constexpr bool is_nullable = /* extract from Props... */;
       static constexpr bool track_ref = /* extract from Props... */;
       
       // Implicit conversion to/from T
       field() = default;
       field(const T& value) : value_(value) {}
       field(T&& value) : value_(std::move(value)) {}
       
       operator T&() { return value_; }
       operator const T&() const { return value_; }
       
       T& get() { return value_; }
       const T& get() const { return value_; }
       
       T* operator->() { return &value_; }
       const T* operator->() const { return &value_; }
       
   private:
       T value_;
   };
   
   } // namespace fory
   ```
   
   ### Alternative: Macro-based Approach
   
   For compatibility with existing codebases that can't change field types:
   
   ```cpp
   struct Foo {
       std::string f1;
       Bar f2;
       std::optional<std::string> f3;
       std::shared_ptr<Node> parent;
   };
   
   // Define field metadata separately
   FORY_FIELD_INFO(Foo,
       FORY_FIELD(f1, id = 0),
       FORY_FIELD(f2, id = 1),
       FORY_FIELD(f3, id = 2, nullable = true),
       FORY_FIELD(parent, id = 3, ref = true, nullable = true)
   );
   ```
   
   ### Design Decision: Required `id`
   
   The `id` template parameter is **required**:
   - `fory::id<0>` to `fory::id<N>`: Use tag ID encoding
   - `fory::id<-1>`: Explicit opt-out, use field name encoding
   
   Rationale:
   1. **Explicit control**: Using `fory::field<>` means opting into explicit 
control
   2. **Compile-time validation**: Template can static_assert uniqueness
   3. **Proven pattern**: Similar to protobuf field numbers
   
   ### Optimization Details
   
   #### 1. Non-nullable (Default) Optimization
   
   When `nullable` tag is NOT present:
   - Skip writing the null flag entirely (1 byte saved per field)
   - Directly serialize the field value
   - For `std::optional<T>`, must add `nullable` tag
   
   #### 2. No Ref Tracking (Default) Optimization
   
   When `ref` tag is NOT present:
   - Skip reference tracking map operations
   - Skip ref flag when combined with non-nullable
   - For `std::shared_ptr<T>`, consider adding `ref` tag if circular refs are 
possible
   
   #### 3. Tag ID Optimization
   
   When `id<N>` where N >= 0:
   - Field name encoded as varint instead of meta string
   - Significant space savings for long field names
   
   **Space savings:**
   
   | Field Name | Meta String (approx) | Tag ID |
   |------------|---------------------|--------|
   | `f1` | ~2 bytes | 1 byte |
   | `user_name` | ~6 bytes | 1 byte |
   | `transaction_id` | ~10 bytes | 1 byte |
   
   ### Implementation Notes
   
   1. **Template Metaprogramming**:
      - Use variadic templates to extract properties
      - Provide `constexpr` accessors for compile-time queries
      - Enable optimizations via `if constexpr`
   
   2. **Serializer Integration**:
      ```cpp
      template<typename T, typename... Props>
      struct Serializer<fory::field<T, Props...>> {
          static void write(Writer& writer, const fory::field<T, Props...>& f) {
              if constexpr (!fory::field<T, Props...>::is_nullable) {
                  // Skip null check, directly serialize
                  Serializer<T>::write(writer, f.get());
              } else {
                  // Write null flag, then value if not null
                  // ...
              }
          }
      };
      ```
   
   3. **Zero Overhead**:
      - `fory::field<T, ...>` should have same memory layout as `T`
      - All metadata is compile-time only
      - No runtime overhead compared to raw field
   
   4. **Validation**:
      - `static_assert` for duplicate tag IDs at compile time
      - `static_assert` for `id < -1`
      - Runtime error if non-nullable field has null value
   
   ### Performance Impact
   
   For a struct with 10 fields using default settings (non-nullable, no ref 
tracking):
   - **Space savings**: ~20 bytes per object (null + ref flags)
   - **CPU savings**: 10 fewer hash map operations per serialization
   - **Zero runtime overhead** for metadata (all compile-time)
   
   ## Additional context
   
   This is the C++ equivalent of Java's `@ForyField` annotation. See [Java 
issue #3000](https://github.com/apache/fory/issues/3000) for the original 
design discussion.
   
   Protocol spec: 
https://fory.apache.org/docs/specification/fory_xlang_serialization_spec


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to