thexiay opened a new pull request, #2620: URL: https://github.com/apache/orc/pull/2620
## What changes were proposed in this pull request? Fix the `estimateRgEndOffset` slop calculation in `RecordReaderUtils.java` to account for the 2-byte RLEv2 DIRECT run header. ### Problem The old formula: ```java int stretchFactor = 2 + (MAX_VALUES_LENGTH * MAX_BYTE_WIDTH - 1) / bufferSize; ``` only considers the value payload (512 * 8 = 4096 bytes) but ignores the 2-byte RLE header. For `bufferSize = 1024`, this gives `stretchFactor = 5`, which is one block short when data is incompressible. ### Fix ```java int maxRleDirectRunSize = MAX_VALUES_LENGTH * MAX_BYTE_WIDTH + 2; int stretchFactor = 2 + (maxRleDirectRunSize - 1) / bufferSize; ``` This correctly yields `stretchFactor = 6`, ensuring enough compressed blocks are allocated. ## How was this patch tested? Added `testTruncatedRleV2DirectRunAtEstimatedEndFails` in `TestInStream.java` that: 1. Creates a compressed stream with incompressible (random) data 2. Truncates it at the old estimated end offset 3. Verifies that reading a full RLE v2 DIRECT run fails with `IllegalArgumentException: Buffer size too small` This proves the old slop estimation was insufficient. Closes #2619 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
