wombatu-kun opened a new pull request, #16719: URL: https://github.com/apache/iceberg/pull/16719
## What The dictionary-encoded fixed-binary readers (`FixedSizeBinaryDictEncodedReader` and the packed branch of `FixedSizeBinaryReader.nextDictEncodedVal`) allocated a `byte[typeWidth]` per value, copied the decoded bytes into it, then called `FixedSizeBinaryVector.set()`. The numeric dictionary readers in the same file write straight into the vector's data buffer. This switches the fixed-binary ones to the same pattern: `decodeToBinary(id).toByteBuffer()` plus `ArrowBuf.setBytes`, dropping the per-value array. `toByteBuffer()` also positions at the dictionary entry's backing offset (the same thing the var-width dictionary reader already does), so the decode is correct regardless of how the dictionary backs its `Binary` entries. This is an allocation cleanup, not a bug fix: with the current Parquet (1.17.1) the previous `getBytesUnsafe()` returns an exact-size copy, so the result was already correct. ## Benchmark `VectorizedDictFixedBinaryDecodeBenchmark` (new) drives the eager-dictionary fixed-binary decode directly, since end-to-end the path is masked by page reading. JMH AverageTime, `-prof gc`, 4096 16-byte values per op: | per value | before | after | | --- | --- | --- | | allocation | 32 B | 0 B | | time | 15.9 ns | 13.6 ns | The old path allocated a runtime-sized `byte[]` per value (which the JIT cannot scalar-replace); the `toByteBuffer()` plus `setBytes` form lets the JIT scalar-replace the wrapper, so per-value heap allocation drops to zero (about 15% faster). ## Testing Adds two unit tests to `TestVectorizedParquetDefinitionLevelReader` that drive the eager-dictionary fixed-binary decode (the batch reader and the packed-inline branch) with a dictionary entry backed at a non-zero offset. This path previously had no direct coverage; the end-to-end tests only hit the lazy dictionary path. ## Note Companion to #16718 (the bulk-copy fixed-binary read optimization). Both touch `VectorizedParquetDefinitionLevelReader.java` in different methods and both add `iceberg-arrow` to the JMH `jmhProjects` list, so whichever merges second needs a trivial rebase. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
