rdsr commented on a change in pull request #843: [WIP] InputFormat support for 
Iceberg
URL: https://github.com/apache/incubator-iceberg/pull/843#discussion_r397509806
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/hadoop/Util.java
 ##########
 @@ -36,4 +47,21 @@ public static FileSystem getFs(Path path, Configuration 
conf) {
       throw new RuntimeIOException(e, "Failed to get file system for path: 
%s", path);
     }
   }
+
+  public static String[] blockLocations(CombinedScanTask task, Configuration 
conf) {
+    Set<String> locationSets = Sets.newHashSet();
+    for (FileScanTask f : task.files()) {
+      Path path = new Path(f.file().path().toString());
+      try {
+        FileSystem fs = path.getFileSystem(conf);
+        for (BlockLocation b : fs.getFileBlockLocations(path, f.start(), 
f.length())) {
+          locationSets.addAll(Arrays.asList(b.getHosts()));
+        }
+      } catch (IOException ioe) {
+        LOG.warn("Failed to get block locations for path {}", path, ioe);
 
 Review comment:
   For now, I've kept it as it was for Spark. Seem like locality is not really 
needed for cloud stores and it is an optimization for HDFS. I can throw an 
exception and then handle it across Spark and MR if you guys think this is 
necessary..

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to