[GitHub] [iceberg] jerryshao commented on a change in pull request #796: Support Spark Structured Streaming Read for Iceberg

GitBox Mon, 21 Sep 2020 05:55:48 -0700


jerryshao commented on a change in pull request #796:
URL: https://github.com/apache/iceberg/pull/796#discussion_r492020991




##########
File path: spark2/src/main/java/org/apache/iceberg/spark/source/Reader.java
##########
@@ -388,6 +389,34 @@ private static void mergeIcebergHadoopConfs(
     return tasks;
   }
 
+  protected boolean checkEnableBatchRead(List<CombinedScanTask> taskList) {
+    boolean allParquetFileScanTasks =
+        taskList.stream()
+            .allMatch(combinedScanTask -> !combinedScanTask.isDataTask() && 
combinedScanTask.files()
+                .stream()
+                .allMatch(fileScanTask -> fileScanTask.file().format().equals(
+                    FileFormat.PARQUET)));
+
+    boolean allOrcFileScanTasks =
+        taskList.stream()
+            .allMatch(combinedScanTask -> !combinedScanTask.isDataTask() && 
combinedScanTask.files()
+                .stream()
+                .allMatch(fileScanTask -> fileScanTask.file().format().equals(
+                    FileFormat.ORC)));
+
+    boolean atLeastOneColumn = lazySchema().columns().size() > 0;
+
+    boolean hasNoIdentityProjections = taskList.stream()

Review comment:
       Original code to check `batchReadsEnabled` will be calculated in 
`toString()`, this is too early to call for streaming code, as `offset` hasn't 
yet been calculated. So streaming code should overwrite this method and extract 
the common parts.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] jerryshao commented on a change in pull request #796: Support Spark Structured Streaming Read for Iceberg

Reply via email to