zratkai commented on code in PR #1431:
URL: https://github.com/apache/orc/pull/1431#discussion_r1129189106
##########
java/core/src/java/org/apache/orc/impl/TreeReaderFactory.java:
##########
@@ -2292,10 +2292,12 @@ private void readDictionaryStream(InStream in) throws
IOException {
int dictionaryBufferSize =
dictionaryOffsets[dictionaryOffsets.length - 1];
dictionaryBuffer = new byte[dictionaryBufferSize];
int pos = 0;
- int chunkSize = in.available();
- byte[] chunkBytes = new byte[chunkSize];
+ //check if dictionary size is smaller than available stream size
+ // to avoid ArrayIndexOutOfBoundsException
+ int readSize = Math.min(in.available(), dictionaryBufferSize);
+ byte[] chunkBytes = new byte[readSize];
while (pos < dictionaryBufferSize) {
- int currentLength = in.read(chunkBytes, 0, chunkSize);
+ int currentLength = in.read(chunkBytes, 0, readSize);
System.arraycopy(chunkBytes, 0, dictionaryBuffer, pos,
currentLength);
pos += currentLength;
Review Comment:
@guiyanakuang Thank you for thinking about that. Actually this won't be a
problem, because if the readSize is bigger than the available input, it just
returns the available here:
int currentLength = in.read(chunkBytes, 0, readSize);
For example:
readSize = 1000
available = 947
then it reads 947 and returns 957 bytes and then
System.arraycopy(chunkBytes, 0, dictionaryBuffer, pos, currentLength) copies
only 947 bytes.
I tested with the org.apache.orc.TestVectorOrcFile#testWithoutIndex where
dictionaryBufferSize = 30947
readSize = 1000
so it reads with 1000 bytes blocks 30 times , and at the end it reads 947
bytes.
The problem what I faces is when the available is bigger than the dictionary
size.
In my case:
available = 4096
dictionarySize =900
Then it reads 4096 and tries to copy into a 900 byte dictionaryBuffer.
Your solution looks also which could work. I don't see which solution has
more advantages.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]