[GitHub] [orc] cxzl25 commented on a diff in pull request #1108: ORC-1167: Support `orc.row.batch.size` configuration

GitBox Tue, 10 May 2022 10:41:28 -0700


cxzl25 commented on code in PR #1108:
URL: https://github.com/apache/orc/pull/1108#discussion_r869520001



##########
java/core/src/java/org/apache/orc/impl/TreeReaderFactory.java:
##########
@@ -1994,7 +1995,18 @@ private static byte[] commonReadByteArrays(InStream 
stream, IntegerReader length
           totalLength = (int) (batchSize * scratchlcv.vector[0]);
         }
       }
-
+      if (totalLength < 0) {
+        StringBuilder sb = new StringBuilder("totalLength:" + totalLength
+                + " is a negative number.");
+        if (batchSize > 1) {

Review Comment:
   It makes sense to add this exception.
   Because when the user encounters the `NegativeArraySizeException`, user does 
not know what to do. The user can only read the source code of the orc to know 
that the batch size can be adjusted.
   
   I can remove it in this PR first and then raise another jira to add this 
exception.
   
   I encountered this problem when reading orc datasource in Spark 3.2, because 
the orc structure is complex, so the default is to use 
`OrcMapreduceRecordReader` to read.
   Then I use config
   ```
   spark.sql.orc.enableNestedColumnVectorizedReader=true
   spark.sql.orc.columnarReaderBatchSize=1
   ```
   Temporarily solved the problem.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [orc] cxzl25 commented on a diff in pull request #1108: ORC-1167: Support `orc.row.batch.size` configuration

Reply via email to