rdsr commented on a change in pull request #843: [WIP] InputFormat support for Iceberg URL: https://github.com/apache/incubator-iceberg/pull/843#discussion_r397509806
########## File path: core/src/main/java/org/apache/iceberg/hadoop/Util.java ########## @@ -36,4 +47,21 @@ public static FileSystem getFs(Path path, Configuration conf) { throw new RuntimeIOException(e, "Failed to get file system for path: %s", path); } } + + public static String[] blockLocations(CombinedScanTask task, Configuration conf) { + Set<String> locationSets = Sets.newHashSet(); + for (FileScanTask f : task.files()) { + Path path = new Path(f.file().path().toString()); + try { + FileSystem fs = path.getFileSystem(conf); + for (BlockLocation b : fs.getFileBlockLocations(path, f.start(), f.length())) { + locationSets.addAll(Arrays.asList(b.getHosts())); + } + } catch (IOException ioe) { + LOG.warn("Failed to get block locations for path {}", path, ioe); Review comment: For now, I've kept it as it was for Spark. Seem like locality is not really needed for cloud stores and it is an optimization for HDFS. I can throw an exception and then handle it across Spark and MR if you guys think this is necessary.. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org