cxzl25 commented on code in PR #1412:
URL: https://github.com/apache/orc/pull/1412#discussion_r1108110140
##########
java/core/src/java/org/apache/orc/impl/DynamicByteArray.java:
##########
@@ -59,10 +65,16 @@ private void grow(int chunkIndex) {
int newSize = Math.max(chunkIndex + 1, 2 * data.length);
data = Arrays.copyOf(data, newSize);
}
- for(int i=initializedChunks; i <= chunkIndex; ++i) {
+ for (int i = initializedChunks; i <= chunkIndex; ++i) {
data[i] = new byte[chunkSize];
}
initializedChunks = chunkIndex + 1;
+ } else if (chunkIndex < 0) {
+ LOG.error("chunkIndex overflow:{}. You can adjust the relevant
configuration: {},{}.",
Review Comment:
Usually this problem occurs in the production environment, I usually set
`orc.dictionary.key.threshold=0`.
Or find which field is a large string and skip it by
`orc.column.encoding.direct=columnName`.
Because sometimes it is difficult to find which field is a large string, at
this time, we can configure `orc.column.encoding.direct=*`. This is equivalent
to `orc.dictionary.key.threshold=0`.
How about this?
```bash
2023-02-15 23:37:26,658 [main] ERROR DynamicByteArray: chunkIndex
overflow:-65535. You can set orc.column.encoding.direct=columnName, or
orc.dictionary.key.threshold=0 to turn off dictionary encoding.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]