shangxinli opened a new pull request, #374:
URL: https://github.com/apache/iceberg-cpp/pull/374

   Replace GenericDatum intermediate layer with direct Avro decoder access to 
improve manifest I/O performance.
   
   Changes:
   - Add avro_direct_decoder_internal.h with DecodeAvroToBuilder API
   - Add avro_direct_decoder.cc implementing direct Avro→Arrow decoding
     - Primitive types: bool, int, long, float, double, string, binary, fixed
     - Temporal types: date, time, timestamp
     - Logical types: uuid, decimal
     - Nested types: struct, list, map
     - Union type handling for nullable fields
   - Modify avro_reader.cc to use DataFileReaderBase with direct decoder
     - Replace DataFileReader<GenericDatum> with DataFileReaderBase
     - Use decoder.decodeInt(), decodeLong(), etc. directly
     - Remove GenericDatum allocation and extraction overhead
   - Update CMakeLists.txt to include new decoder source
   
   Performance improvement:
   - Before: Avro binary → GenericDatum → Extract → Arrow (3 steps)
   - After: Avro binary → decoder.decodeInt() → Arrow (2 steps)
   
   This matches Java implementation which uses Decoder directly via ValueReader 
interface, avoiding intermediate object allocation.
   
   All avro_test cases pass.
   
   Issue: #332


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to