[GitHub] [iceberg] jackye1995 commented on a change in pull request #2678: Core: add HadoopConfigurable interface to serialize custom FileIO

GitBox Mon, 21 Jun 2021 21:14:47 -0700


jackye1995 commented on a change in pull request #2678:
URL: https://github.com/apache/iceberg/pull/2678#discussion_r655862496




##########
File path: spark2/src/main/java/org/apache/iceberg/spark/source/Reader.java
##########
@@ -132,7 +132,7 @@
     this.splitLookback = 
options.get(SparkReadOptions.LOOKBACK).map(Integer::parseInt).orElse(null);
     this.splitOpenFileCost = 
options.get(SparkReadOptions.FILE_OPEN_COST).map(Long::parseLong).orElse(null);
 
-    if (table.io() instanceof HadoopFileIO) {
+    if (table.io() instanceof HadoopConfigurable) {

Review comment:
       This seems to be related to the discussion in the dev list regarding the 
locality read configuration. In the use case I am trying to support, it 
actually needs to run that code path and turn locality read to true by default 
although it is not a `HadoopFileIO`. So we have both cases to support, and it 
is not sufficient to determine preference of locality read purely based on the 
FileIO implementation and file URI. 
   
   In the code path, because it is at reader initialization time, table 
property seems to be the best place to store this default behavior, although I 
agree this is not really elegant as it is spark specific.
   
   For now, I think using the `HadoopConfigurable` check instead of 
`HadoopFileIO` check is more flexible, because users who do not need locality 
for `HadoopConfigurable` are likely not using HDFS anyway and will fail the 
check, and they can also use `locality` option to override the choice to turn 
it off.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] jackye1995 commented on a change in pull request #2678: Core: add HadoopConfigurable interface to serialize custom FileIO

Reply via email to