pvary commented on PR #15410: URL: https://github.com/apache/iceberg/pull/15410#issuecomment-4067463253
I did a first review here: The functional change is correct, but the implementation approach of extending `BaseVectorizedParquetValuesReader` is the wrong abstraction. A standalone reader that follows the established extends `ValuesReader` implements `VectorizedValuesReader` pattern would be cleaner, more maintainable, and consistent with how all the other v2 encoding readers were implemented. **Recommendation** Create a dedicated `VectorizedRleBooleanValuesReader` extends `ValuesReader` implements `VectorizedValuesReader` that: - Only handles boolean `RLE` decoding (bit width 1) - Has its own `initFromPage` that reads the 4-byte length + RLE-decodes booleans - Implements readBoolean() and a batch readBooleans(int total, FieldVector vec, int rowId) if possible - Follows the same extends `ValuesReader` implements `VectorizedValuesReader` pattern as all the other v2 encoding readers This can reuse the RLE decoding logic from `BaseVectorizedParquetValuesReader` by composition (or just inline the small amount of RLE logic needed for bit-width-1 booleans), rather than by inheritance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
