shangxinli opened a new pull request, #374:
URL: https://github.com/apache/iceberg-cpp/pull/374
Replace GenericDatum intermediate layer with direct Avro decoder access to
improve manifest I/O performance.
Changes:
- Add avro_direct_decoder_internal.h with DecodeAvroToBuilder API
- Add avro_direct_decoder.cc implementing direct Avro→Arrow decoding
- Primitive types: bool, int, long, float, double, string, binary, fixed
- Temporal types: date, time, timestamp
- Logical types: uuid, decimal
- Nested types: struct, list, map
- Union type handling for nullable fields
- Modify avro_reader.cc to use DataFileReaderBase with direct decoder
- Replace DataFileReader<GenericDatum> with DataFileReaderBase
- Use decoder.decodeInt(), decodeLong(), etc. directly
- Remove GenericDatum allocation and extraction overhead
- Update CMakeLists.txt to include new decoder source
Performance improvement:
- Before: Avro binary → GenericDatum → Extract → Arrow (3 steps)
- After: Avro binary → decoder.decodeInt() → Arrow (2 steps)
This matches Java implementation which uses Decoder directly via ValueReader
interface, avoiding intermediate object allocation.
All avro_test cases pass.
Issue: #332
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]