[GitHub] [orc] cxzl25 commented on a diff in pull request #1412: ORC-1373: Add log when DynamicByteArray length overflow

via GitHub Wed, 15 Feb 2023 23:38:35 -0800


cxzl25 commented on code in PR #1412:
URL: https://github.com/apache/orc/pull/1412#discussion_r1108110140



##########
java/core/src/java/org/apache/orc/impl/DynamicByteArray.java:
##########
@@ -59,10 +65,16 @@ private void grow(int chunkIndex) {
         int newSize = Math.max(chunkIndex + 1, 2 * data.length);
         data = Arrays.copyOf(data, newSize);
       }
-      for(int i=initializedChunks; i <= chunkIndex; ++i) {
+      for (int i = initializedChunks; i <= chunkIndex; ++i) {
         data[i] = new byte[chunkSize];
       }
       initializedChunks = chunkIndex + 1;
+    } else if (chunkIndex < 0) {
+      LOG.error("chunkIndex overflow:{}. You can adjust the relevant 
configuration: {},{}.",

Review Comment:
   Usually this problem occurs in the production environment, I usually set 
`orc.dictionary.key.threshold=0`.  
   Or find which field is a large string and skip it by 
`orc.column.encoding.direct=columnName`.   
   Because sometimes it is difficult to find which field is a large string, at 
this time, we can configure `orc.column.encoding.direct=*`. This is equivalent 
to `orc.dictionary.key.threshold=0`.
   
   
   How about this?
   ```bash
   2023-02-15 23:37:26,658 [main] ERROR DynamicByteArray: chunkIndex 
overflow:-65535. You can set orc.column.encoding.direct=columnName, or 
orc.dictionary.key.threshold=0 to turn off dictionary encoding.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [orc] cxzl25 commented on a diff in pull request #1412: ORC-1373: Add log when DynamicByteArray length overflow

Reply via email to