[GitHub] [iceberg] rdblue commented on a change in pull request #2933: Follow-up on the "Add Vectorized Reader pr 2286"

GitBox Wed, 04 Aug 2021 08:33:46 -0700


rdblue commented on a change in pull request #2933:
URL: https://github.com/apache/iceberg/pull/2933#discussion_r682729811




##########
File path: 
arrow/src/main/java/org/apache/iceberg/arrow/vectorized/ArrowReader.java
##########
@@ -197,26 +227,41 @@ public void close() throws IOException {
      *                          in the previous {@link Iterator#next()} call 
are closed before creating
      *                          new instances if the current {@link 
Iterator#next()}.
      */
-    VectorizedCombinedScanIterator(
-        CloseableIterable<CombinedScanTask> tasks,
-        Schema expectedSchema,
-        String nameMapping,
-        FileIO io,
-        EncryptionManager encryptionManager,
-        boolean caseSensitive,
-        int batchSize,
-        boolean reuseContainers) {
-      this.fileTasks = StreamSupport.stream(tasks.spliterator(), false)
-          .map(CombinedScanTask::files)
-          .flatMap(Collection::stream)
-          .collect(Collectors.toList());
+    VectorizedCombinedScanIterator(CloseableIterable<CombinedScanTask> tasks,
+                                   Schema expectedSchema,
+                                   String nameMapping,
+                                   FileIO io,
+                                   EncryptionManager encryptionManager,
+                                   boolean caseSensitive,
+                                   int batchSize,
+                                   boolean reuseContainers) {
+      List<FileScanTask> fileTasks = StreamSupport.stream(tasks.spliterator(), 
false)
+              .map(CombinedScanTask::files)
+              .flatMap(Collection::stream)
+              .collect(Collectors.toList());
       this.fileItr = fileTasks.iterator();
 
+      boolean atLeastOneColumn = expectedSchema.columns().size() > 0;
+      boolean hasNoDeleteFiles = 
fileTasks.stream().noneMatch(TableScanUtil::hasDeletes);
+      boolean hasSupportedTypes = expectedSchema.columns().stream()
+              .map(c -> c.type().typeId())
+              .allMatch(SUPPORTED_TYPES::contains);
+      if (!atLeastOneColumn || !hasNoDeleteFiles || !hasSupportedTypes) {
+        throw new UnsupportedOperationException(
+                "ArrowReader is supported for the query schema with at least 
one column," +
+                        " with no delete files and for supported data types" +
+                        ", but found that atLeastOneColumn=" + 
atLeastOneColumn +
+                        ", hasNoDeleteFiles=" + hasNoDeleteFiles +
+                        ", hasSupportedTypes=" + hasSupportedTypes +
+                        ", supported types=" + SUPPORTED_TYPES +

Review comment:
       This error message isn't very helpful because it checks several things 
and then mixes them together so the person reading the error has to figure out 
what is already known here: whether the failure was because of delete files, 
supported types, or expected columns.
   
   This should be reformatted into 3 different checks with specific error 
messages, like "Cannot read files that require applying delete files: <split>", 
"Cannot read without at least one projected column", and "Cannot read 
unsupported column types: <unsupported-types>". The last one should produce a 
list of the types that are used but not supported, since that is already known.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a change in pull request #2933: Follow-up on the "Add Vectorized Reader pr 2286"

Reply via email to