mapleFU commented on issue #14923: URL: https://github.com/apache/arrow/issues/14923#issuecomment-1369959539
The code is like: ```java private void loadNewBlockToBuffer() throws IOException { try { minDeltaInCurrentBlock = BytesUtils.readZigZagVarLong(in); } catch (IOException e) { throw new ParquetDecodingException("can not read min delta in current block", e); } readBitWidthsForMiniBlocks(); // mini block is atomic for reading, we read a mini block when there are more values left int i; for (i = 0; i < config.miniBlockNumInABlock && valuesBuffered < totalValueCount; i++) { BytePackerForLong packer = Packer.LITTLE_ENDIAN.newBytePackerForLong(bitWidths[i]); unpackMiniBlock(packer); } //calculate values from deltas unpacked for current block int valueUnpacked=i*config.miniBlockSizeInValues; for (int j = valuesBuffered-valueUnpacked; j < valuesBuffered; j++) { int index = j; valuesBuffer[index] += minDeltaInCurrentBlock + valuesBuffer[index - 1]; } } private void readBitWidthsForMiniBlocks() { for (int i = 0; i < config.miniBlockNumInABlock; i++) { try { bitWidths[i] = BytesUtils.readIntLittleEndianOnOneByte(in); } catch (IOException e) { throw new ParquetDecodingException("Can not decode bitwidth in block header", e); } } } private int[] bitWidths; ``` * when `readBitWidthsForMiniBlocks`, it will not check it's size * Only when `valuesBuffered < totalValueCount`, it will use `bitWidths` and unpack mini block I think your fixing is much better, let's make it into our codebase :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org