The GitHub Actions job "C++ Linter" on iceberg-cpp.git/avro_reader has failed.
Run started by GitHub user shangxinli (triggered by shangxinli).

Head commit for run:
eae14b1d31aaf50b6c4b26ed70c56f7b4b7c7dd9 / Xinli Shang <[email protected]>
feat: eliminate GenericDatum in Avro reader for performance

Replace GenericDatum intermediate layer with direct Avro decoder access
to improve manifest I/O performance.

Changes:
- Add avro_direct_decoder_internal.h with DecodeAvroToBuilder API
- Add avro_direct_decoder.cc implementing direct Avro→Arrow decoding
  - Primitive types: bool, int, long, float, double, string, binary, fixed
  - Temporal types: date, time, timestamp
  - Logical types: uuid, decimal (with validation)
  - Nested types: struct, list, map
  - Union type handling with bounds checking
  - Field skipping with proper multi-block handling for arrays/maps
- Modify avro_reader.cc to use DataFileReaderBase with direct decoder
  - Replace DataFileReader<GenericDatum> with DataFileReaderBase
  - Use decoder.decodeInt(), decodeLong(), etc. directly
  - Remove GenericDatum allocation and extraction overhead
- Update CMakeLists.txt to include new decoder source

Validation added:
- Union branch bounds checking
- Decimal byte width validation (uses schema fixedSize, not calculated)
- Decimal precision sufficiency validation
- Logical type presence validation
- Type mismatch error handling

Documentation:
- Comprehensive API documentation in header
- Schema evolution handling via SchemaProjection explained
- Error handling behavior documented
- Limitations noted (default values not supported)

Performance improvement:
- Before: Avro binary → GenericDatum → Extract → Arrow (3 steps)
- After: Avro binary → decoder.decodeInt() → Arrow (2 steps)

This matches Java implementation which uses Decoder directly via
ValueReader interface, avoiding intermediate object allocation.

All 173 avro_test cases pass.

Issue: #332

Report URL: https://github.com/apache/iceberg-cpp/actions/runs/19805365387

With regards,
GitHub Actions via GitBox

Reply via email to