cxzl25 opened a new pull request, #1412:
URL: https://github.com/apache/orc/pull/1412

   ### What changes were proposed in this pull request?
   
   When `DynamicByteArray` calculates `chunkIndex` overflow, it will throw NPE.
   We can add a log to remind users to avoid this problem by configuring what 
ORC parameters.
   
   ### Why are the changes needed?
   
   When the written string is very large, the grow calculation may overflow, 
causing the array data not to be expanded, and then NPE.
   
   org.apache.orc.impl.DynamicByteArray#add(byte[], int, int)
   ```java
   grow((length + valueLength) / chunkSize);
   ```
   
   #### Log
   
   ```java
   Caused by: java.lang.NullPointerException
        at java.lang.System.arraycopy(Native Method)
        at org.apache.orc.impl.DynamicByteArray.add(DynamicByteArray.java:115)
        at 
org.apache.orc.impl.StringRedBlackTree.addNewKey(StringRedBlackTree.java:48)
        at 
org.apache.orc.impl.StringRedBlackTree.add(StringRedBlackTree.java:60)
        at 
org.apache.orc.impl.writer.StringTreeWriter.writeBatch(StringTreeWriter.java:69)
        at 
org.apache.orc.impl.writer.StructTreeWriter.writeRootBatch(StructTreeWriter.java:56)
        at org.apache.orc.impl.WriterImpl.addRowBatch(WriterImpl.java:696)
   ```
   #### Local Test
   org.apache.orc.impl.TestDynamicArray#testBigByteArray
   ```java
    @Test
     public void testBigByteArray() {
       DynamicByteArray dba = new DynamicByteArray(128, 32 * 1024);
   
       byte[] val = new byte[1024];
       dba.add(val, 0, val.length);
   
       byte[] bigVal = new byte[Integer.MAX_VALUE - 16];
       dba.add(bigVal, 0, bigVal.length);
     }
   ```
   
   ### How was this patch tested?
   local test
   
   Output>
   ```bash
   2023-02-15 20:25:16,938 [main] ERROR DynamicByteArray: chunkIndex 
overflow:-65535. You can adjust the relevant configuration: 
orc.column.encoding.direct,orc.dictionary.key.threshold.
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to