ConfX created HIVE-27519:
----------------------------

             Summary: Inifite array growth when optimized hashtable size is set 
to 0
                 Key: HIVE-27519
                 URL: https://issues.apache.org/jira/browse/HIVE-27519
             Project: Hive
          Issue Type: Bug
            Reporter: ConfX
         Attachments: reproduce.sh

h2. What happened:

When set optimized hashtable size to 0 by 
{{{}hive.mapjoin.optimized.hashtable.wbsize == 0{}}}, there is an infinite 
array growth in {{WriteBuffers.java#nextBufferToWrite()}} and crashes the 
system unexpectedly.
h2. Buggy code:
{noformat}
  private void nextBufferToWrite() {
    if (writePos.bufferIndex == (writeBuffers.size() - 1)) {
      if ((1 + writeBuffers.size()) * ((long)wbSize) > maxSize) {   // <--- 
always false because wbSize is 0
        throw new RuntimeException("Too much memory used by write buffers");
      }
      writeBuffers.add(new byte[wbSize]);  // <---- wbSize is 0 here
    }
    ++writePos.bufferIndex;
    writePos.buffer = writeBuffers.get(writePos.bufferIndex);
    writePos.offset = 0;
  }{noformat}
When setting the optimized hashtable size to 0, the variable {{wbSize}} here 
equals to 0. So in this case, writeBuffers.add() method keeps adding 
zero-length byte array, the if statement {{if (writePos.bufferIndex == 
(writeBuffers.size() - 1)) }} is always true because {{writePos.bufferIndex}} 
is increased by one each time. The size of the {{writeBuffers}} is also 
increased by one each time. Also, the {{if ((1 + writeBuffers.size()) * 
((long)wbSize) > maxSize)}} never becomes true because {{wbSize}} is 0 and the 
RuntimeException inside will not be thrown. This makes the method keep adding 
zero-length byte array to {{{}writeBuffers{}}}, causing OOM and crash the 
system.
h2. How to reproduce:

(1) Set {{hive.mapjoin.optimized.hashtable.wbsize}} to 0
(2) Run test 
{{org.apache.hadoop.hive.ql.exec.vector.mapjoin.TestMapJoinOperator#testMultiKey2}}
For an easy reproduction, run the {{reproduce.sh}} in the attachment.
h2. StackTrace:
{noformat}
java.lang.OutOfMemoryError: Java heap space
        at java.base/java.util.Arrays.copyOf(Arrays.java:3689)                  
                                                                                
                                       
        at java.base/java.util.ArrayList.grow(ArrayList.java:238)               
                                                                                
                                       
        at java.base/java.util.ArrayList.grow(ArrayList.java:243)
        at java.base/java.util.ArrayList.add(ArrayList.java:486)
        at java.base/java.util.ArrayList.add(ArrayList.java:499)                
   
        at 
org.apache.hadoop.hive.serde2.WriteBuffers.nextBufferToWrite(WriteBuffers.java:261)
        at 
org.apache.hadoop.hive.serde2.WriteBuffers.write(WriteBuffers.java:237)
        at 
org.apache.hadoop.hive.serde2.WriteBuffers.write(WriteBuffers.java:222)         
                                                                                
                            
        at 
org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.put(BytesBytesMultiHashMap.java:424)
        at 
org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.putRow(MapJoinBytesTableContainer.java:461)
        at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.MapJoinTestConfig.loadTableContainerData(MapJoinTestConfig.java:794)
        at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.MapJoinTestConfig.createMapJoin(MapJoinTestConfig.java:846)
        at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.MapJoinTestConfig.createMapJoinImplementation(MapJoinTestConfig.java:997)
        at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.MapJoinTestConfig.createMapJoinImplementation(MapJoinTestConfig.java:971)
        at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.TestMapJoinOperator.executeTestImplementation(TestMapJoinOperator.java:1968)
        at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.TestMapJoinOperator.executeRowModeOptimized(TestMapJoinOperator.java:1906)
        at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.TestMapJoinOperator.doExecuteTest(TestMapJoinOperator.java:1859)
        at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.TestMapJoinOperator.executeTestInner(TestMapJoinOperator.java:1807)
        at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.TestMapJoinOperator.executeTest(TestMapJoinOperator.java:1783)
        at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.TestMapJoinOperator.doTestMultiKey2(TestMapJoinOperator.java:1144)
        at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.TestMapJoinOperator.testMultiKey2(TestMapJoinOperator.java:1076){noformat}
For an easy reproduction, run the reproduce.sh in the attachment.

We are happy to provide a patch if this issue is confirmed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to