[GitHub] [iceberg] bryanck commented on a diff in pull request #5934: Spark 3.3: Split SparkScan and SparkBatch

GitBox Fri, 07 Oct 2022 16:14:44 -0700


bryanck commented on code in PR #5934:
URL: https://github.com/apache/iceberg/pull/5934#discussion_r990546078



##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkBatch.java:
##########
@@ -115,18 +119,37 @@ private boolean parquetBatchReadsEnabled() {
   }
 
   private boolean orcOnly() {
-    return tasks().stream()
+    return taskGroups.stream()
         .allMatch(task -> !task.isDataTask() && onlyFileFormat(task, 
FileFormat.ORC));
   }
 
   private boolean orcBatchReadsEnabled() {
     return readConf.orcVectorizationEnabled()
         && // vectorization enabled
-        tasks().stream().noneMatch(TableScanUtil::hasDeletes); // no delete 
files
+        taskGroups.stream().noneMatch(TableScanUtil::hasDeletes); // no delete 
files
   }
 
   private boolean onlyFileFormat(CombinedScanTask task, FileFormat fileFormat) 
{
     return task.files().stream()
         .allMatch(fileScanTask -> 
fileScanTask.file().format().equals(fileFormat));
   }
+
+  @Override
+  public boolean equals(Object o) {

Review Comment:
   One note, I backported this to Spark 3.2 and ran it with that, as Spark 3.3 
has a performance regression (https://issues.apache.org/jira/browse/SPARK-40703)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] bryanck commented on a diff in pull request #5934: Spark 3.3: Split SparkScan and SparkBatch

Reply via email to