bryanck commented on code in PR #5934:
URL: https://github.com/apache/iceberg/pull/5934#discussion_r990545763
##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkBatch.java:
##########
@@ -115,18 +119,37 @@ private boolean parquetBatchReadsEnabled() {
}
private boolean orcOnly() {
- return tasks().stream()
+ return taskGroups.stream()
.allMatch(task -> !task.isDataTask() && onlyFileFormat(task,
FileFormat.ORC));
}
private boolean orcBatchReadsEnabled() {
return readConf.orcVectorizationEnabled()
&& // vectorization enabled
- tasks().stream().noneMatch(TableScanUtil::hasDeletes); // no delete
files
+ taskGroups.stream().noneMatch(TableScanUtil::hasDeletes); // no delete
files
}
private boolean onlyFileFormat(CombinedScanTask task, FileFormat fileFormat)
{
return task.files().stream()
.allMatch(fileScanTask ->
fileScanTask.file().format().equals(fileFormat));
}
+
+ @Override
+ public boolean equals(Object o) {
Review Comment:
I ran the benchmark and this change doesn't have any negative impact on
performance, so LGTM!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]