This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-1.8
in repository https://gitbox.apache.org/repos/asf/orc.git


The following commit(s) were added to refs/heads/branch-1.8 by this push:
     new 5576f18ef ORC-1384: Fix `ArrayIndexOutOfBoundsException` when reading 
dictionary stream bigger then dictionary
5576f18ef is described below

commit 5576f18efcedd1136f67ac220771da25f7d018b8
Author: Zoltan Ratkai <[email protected]>
AuthorDate: Thu Mar 9 12:51:22 2023 -0800

    ORC-1384: Fix `ArrayIndexOutOfBoundsException` when reading dictionary 
stream bigger then dictionary
    
    ### What changes were proposed in this pull request?
    Avoid  ArrayIndexOutOfBoundsException when reading dictionary stream bigger 
then dictionary. Check the size of the dictionary and input and read only the 
min of those.
    
    ### Why are the changes needed?
    In Hive when reading with LLAP data is read in 4kB blocks which leads to 
ArrayIndexOutOfBoundsException when the dictionary is smaller.
    
    ### How was this patch tested?
    It is tested with HIVE's qtest, since here we do not have the necessary 
subclasses.
    
    Closes #1431 from zratkai/ORC-1384.
    
    Lead-authored-by: Zoltan Ratkai <[email protected]>
    Co-authored-by: Dongjoon Hyun <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
    (cherry picked from commit 8cf9057fc498f977125be3b721daf2170330b3f9)
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 java/core/src/java/org/apache/orc/impl/TreeReaderFactory.java | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/java/core/src/java/org/apache/orc/impl/TreeReaderFactory.java 
b/java/core/src/java/org/apache/orc/impl/TreeReaderFactory.java
index fff419956..f5ed69dc2 100644
--- a/java/core/src/java/org/apache/orc/impl/TreeReaderFactory.java
+++ b/java/core/src/java/org/apache/orc/impl/TreeReaderFactory.java
@@ -2235,10 +2235,15 @@ public class TreeReaderFactory {
           int dictionaryBufferSize = 
dictionaryOffsets[dictionaryOffsets.length - 1];
           dictionaryBuffer = new byte[dictionaryBufferSize];
           int pos = 0;
-          int chunkSize = in.available();
-          byte[] chunkBytes = new byte[chunkSize];
+          // check if dictionary size is smaller than available stream size
+          // to avoid ArrayIndexOutOfBoundsException
+          int readSize = Math.min(in.available(), dictionaryBufferSize);
+          byte[] chunkBytes = new byte[readSize];
           while (pos < dictionaryBufferSize) {
-            int currentLength = in.read(chunkBytes, 0, chunkSize);
+            int currentLength = in.read(chunkBytes, 0, readSize);
+            // check if dictionary size is smaller than available stream size
+            // to avoid ArrayIndexOutOfBoundsException
+            currentLength = Math.min(currentLength, dictionaryBufferSize - 
pos);
             System.arraycopy(chunkBytes, 0, dictionaryBuffer, pos, 
currentLength);
             pos += currentLength;
           }

Reply via email to