RussellSpitzer commented on a change in pull request #1421:
URL: https://github.com/apache/iceberg/pull/1421#discussion_r499699567
##########
File path: spark2/src/main/java/org/apache/iceberg/spark/source/Reader.java
##########
@@ -385,16 +387,40 @@ private static void mergeIcebergHadoopConfs(
}
}
- try (CloseableIterable<CombinedScanTask> tasksIterable =
scan.planTasks()) {
- this.tasks = Lists.newArrayList(tasksIterable);
- } catch (IOException e) {
- throw new RuntimeIOException(e, "Failed to close table scan: %s",
scan);
- }
+ this.tasks = planScan(scan);
}
-
return tasks;
}
+ private List<CombinedScanTask> planScan(TableScan scan) {
+ // TODO Need to only use distributed planner for supported implementations
and add some heuristics about when
+ // to use
+ if (scan instanceof DataTableScan) {
+ return planDistributedScan(scan);
+ } else {
+ return planLocalScan(scan);
+ }
+ }
+
+ private List<CombinedScanTask> planDistributedScan(TableScan scan) {
+ List<CombinedScanTask> result;
+ try {
Review comment:
I think ideally we attempt to do both a local and distributed planning
in parallel and just use whichever returns first but this could be a future
improvement.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]