[GitHub] [iceberg] rdblue commented on a change in pull request #3817: Flink: FlinkInputSplit extends LocatableInputSplit instread of InputSplit

GitBox Wed, 29 Dec 2021 14:29:30 -0800


rdblue commented on a change in pull request #3817:
URL: https://github.com/apache/iceberg/pull/3817#discussion_r776513086




##########
File path: 
flink/v1.14/flink/src/main/java/org/apache/iceberg/flink/source/FlinkSplitGenerator.java
##########
@@ -38,7 +48,22 @@ private FlinkSplitGenerator() {
     List<CombinedScanTask> tasks = tasks(table, context);
     FlinkInputSplit[] splits = new FlinkInputSplit[tasks.size()];
     for (int i = 0; i < tasks.size(); i++) {
-      splits[i] = new FlinkInputSplit(i, tasks.get(i));
+      Set<String> hosts = Sets.newHashSet();
+      CombinedScanTask combinedScanTask = tasks.get(i);
+      combinedScanTask.files().forEach(fileScanTask -> {
+        try {
+          final FileSystem fs = new 
HadoopFileSystem(DistributedFileSystem.get(new Configuration()));

Review comment:
       This should do the same thing that Spark does. Specifically:
   * There should be a flag to enable locality
   * The flag should be defaulted based on whether the scheme identifies a FS 
that has locality
   * `new Configuration()` should not be called
   * Use `Util.blockLocations` to get block locations
   * Ideally, parallelize the block lookup per split because it can get 
expensive




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a change in pull request #3817: Flink: FlinkInputSplit extends LocatableInputSplit instread of InputSplit

Reply via email to