[GitHub] [iceberg] rdblue commented on a change in pull request #2678: Core: add HadoopConfigurable interface to serialize custom FileIO

GitBox Tue, 22 Jun 2021 12:23:18 -0700


rdblue commented on a change in pull request #2678:
URL: https://github.com/apache/iceberg/pull/2678#discussion_r656515650




##########
File path: spark2/src/main/java/org/apache/iceberg/spark/source/Reader.java
##########
@@ -132,7 +132,7 @@
     this.splitLookback = 
options.get(SparkReadOptions.LOOKBACK).map(Integer::parseInt).orElse(null);
     this.splitOpenFileCost = 
options.get(SparkReadOptions.FILE_OPEN_COST).map(Long::parseLong).orElse(null);
 
-    if (table.io() instanceof HadoopFileIO) {
+    if (table.io() instanceof HadoopConfigurable) {

Review comment:
       I think it's a reasonable idea to be able to set a property somewhere 
and use locality. But this specifically should check for `HadoopFileIO` until 
we know what that property does and where it is because the locality code path 
is going to get a `FileSystem` instance. If you aren't using `HadoopFileIO` 
then there is no guarantee that we can get a file system or that it is the same 
as the IO instance.
   
   Let's fix the locality problem later and keep this commit focused on updates 
to add `HadoopConfigurable`. With just that as the goal of this PR, I think 
this should be left as `HadoopFileIO`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a change in pull request #2678: Core: add HadoopConfigurable interface to serialize custom FileIO

Reply via email to