[
https://issues.apache.org/jira/browse/HIVE-27519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Denys Kuzmenko updated HIVE-27519:
----------------------------------
Priority: Major (was: Critical)
> Inifite array growth when optimized hashtable size is set to 0
> --------------------------------------------------------------
>
> Key: HIVE-27519
> URL: https://issues.apache.org/jira/browse/HIVE-27519
> Project: Hive
> Issue Type: Bug
> Reporter: ConfX
> Priority: Major
> Attachments: reproduce.sh
>
>
> h2. What happened:
> When set optimized hashtable size to 0 by
> {{{}hive.mapjoin.optimized.hashtable.wbsize == 0{}}}, there is an infinite
> array growth in {{WriteBuffers.java#nextBufferToWrite()}} and crashes the
> system unexpectedly.
> h2. Buggy code:
> {noformat}
> private void nextBufferToWrite() {
> if (writePos.bufferIndex == (writeBuffers.size() - 1)) {
> if ((1 + writeBuffers.size()) * ((long)wbSize) > maxSize) { // <---
> always false because wbSize is 0
> throw new RuntimeException("Too much memory used by write buffers");
> }
> writeBuffers.add(new byte[wbSize]); // <---- wbSize is 0 here
> }
> ++writePos.bufferIndex;
> writePos.buffer = writeBuffers.get(writePos.bufferIndex);
> writePos.offset = 0;
> }{noformat}
> When setting the optimized hashtable size to 0, the variable {{wbSize}} here
> equals to 0. So in this case, writeBuffers.add() method keeps adding
> zero-length byte array, the if statement {{if (writePos.bufferIndex ==
> (writeBuffers.size() - 1)) }} is always true because {{writePos.bufferIndex}}
> is increased by one each time. The size of the {{writeBuffers}} is also
> increased by one each time. Also, the {{if ((1 + writeBuffers.size()) *
> ((long)wbSize) > maxSize)}} never becomes true because {{wbSize}} is 0 and
> the RuntimeException inside will not be thrown. This makes the method keep
> adding zero-length byte array to {{{}writeBuffers{}}}, causing OOM and crash
> the system.
> h2. How to reproduce:
> (1) Set {{hive.mapjoin.optimized.hashtable.wbsize}} to 0
> (2) Run test
> {{org.apache.hadoop.hive.ql.exec.vector.mapjoin.TestMapJoinOperator#testMultiKey2}}
> For an easy reproduction, run the {{reproduce.sh}} in the attachment.
> h2. StackTrace:
> {noformat}
> java.lang.OutOfMemoryError: Java heap space
> at java.base/java.util.Arrays.copyOf(Arrays.java:3689)
>
>
> at java.base/java.util.ArrayList.grow(ArrayList.java:238)
>
>
> at java.base/java.util.ArrayList.grow(ArrayList.java:243)
> at java.base/java.util.ArrayList.add(ArrayList.java:486)
> at java.base/java.util.ArrayList.add(ArrayList.java:499)
>
> at
> org.apache.hadoop.hive.serde2.WriteBuffers.nextBufferToWrite(WriteBuffers.java:261)
> at
> org.apache.hadoop.hive.serde2.WriteBuffers.write(WriteBuffers.java:237)
> at
> org.apache.hadoop.hive.serde2.WriteBuffers.write(WriteBuffers.java:222)
>
>
> at
> org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.put(BytesBytesMultiHashMap.java:424)
> at
> org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.putRow(MapJoinBytesTableContainer.java:461)
> at
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.MapJoinTestConfig.loadTableContainerData(MapJoinTestConfig.java:794)
> at
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.MapJoinTestConfig.createMapJoin(MapJoinTestConfig.java:846)
> at
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.MapJoinTestConfig.createMapJoinImplementation(MapJoinTestConfig.java:997)
> at
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.MapJoinTestConfig.createMapJoinImplementation(MapJoinTestConfig.java:971)
> at
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.TestMapJoinOperator.executeTestImplementation(TestMapJoinOperator.java:1968)
> at
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.TestMapJoinOperator.executeRowModeOptimized(TestMapJoinOperator.java:1906)
> at
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.TestMapJoinOperator.doExecuteTest(TestMapJoinOperator.java:1859)
> at
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.TestMapJoinOperator.executeTestInner(TestMapJoinOperator.java:1807)
> at
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.TestMapJoinOperator.executeTest(TestMapJoinOperator.java:1783)
> at
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.TestMapJoinOperator.doTestMultiKey2(TestMapJoinOperator.java:1144)
> at
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.TestMapJoinOperator.testMultiKey2(TestMapJoinOperator.java:1076){noformat}
> For an easy reproduction, run the reproduce.sh in the attachment.
> We are happy to provide a patch if this issue is confirmed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)