mapleFU commented on issue #14923:
URL: https://github.com/apache/arrow/issues/14923#issuecomment-1369959539
The code is like:
```java
private void loadNewBlockToBuffer() throws IOException {
try {
minDeltaInCurrentBlock = BytesUtils.readZigZagVarLong(in);
} catch (IOException e) {
throw new ParquetDecodingException("can not read min delta in current
block", e);
}
readBitWidthsForMiniBlocks();
// mini block is atomic for reading, we read a mini block when there are
more values left
int i;
for (i = 0; i < config.miniBlockNumInABlock && valuesBuffered <
totalValueCount; i++) {
BytePackerForLong packer =
Packer.LITTLE_ENDIAN.newBytePackerForLong(bitWidths[i]);
unpackMiniBlock(packer);
}
//calculate values from deltas unpacked for current block
int valueUnpacked=i*config.miniBlockSizeInValues;
for (int j = valuesBuffered-valueUnpacked; j < valuesBuffered; j++) {
int index = j;
valuesBuffer[index] += minDeltaInCurrentBlock + valuesBuffer[index -
1];
}
}
private void readBitWidthsForMiniBlocks() {
for (int i = 0; i < config.miniBlockNumInABlock; i++) {
try {
bitWidths[i] = BytesUtils.readIntLittleEndianOnOneByte(in);
} catch (IOException e) {
throw new ParquetDecodingException("Can not decode bitwidth in block
header", e);
}
}
}
private int[] bitWidths;
```
* when `readBitWidthsForMiniBlocks`, it will not check it's size
* Only when `valuesBuffered < totalValueCount`, it will use `bitWidths` and
unpack mini block
I think your fixing is much better, let's make it into our codebase :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]