jackye1995 commented on a change in pull request #2678:
URL: https://github.com/apache/iceberg/pull/2678#discussion_r655862496
##########
File path: spark2/src/main/java/org/apache/iceberg/spark/source/Reader.java
##########
@@ -132,7 +132,7 @@
this.splitLookback =
options.get(SparkReadOptions.LOOKBACK).map(Integer::parseInt).orElse(null);
this.splitOpenFileCost =
options.get(SparkReadOptions.FILE_OPEN_COST).map(Long::parseLong).orElse(null);
- if (table.io() instanceof HadoopFileIO) {
+ if (table.io() instanceof HadoopConfigurable) {
Review comment:
This seems to be related to the discussion in the dev list regarding the
locality read configuration. In the use case I am trying to support, it
actually needs to run that code path and turn locality read to true by default
although it is not a `HadoopFileIO`. So we have both cases to support, and it
is not sufficient to determine preference of locality read purely based on the
FileIO implementation and file URI.
In the code path, because it is at reader initialization time, table
property seems to be the best place to store this default behavior, although I
agree this is not really elegant as it is spark specific.
For now, I think using the `HadoopConfigurable` check instead of
`HadoopFileIO` check is more flexible, because users who do not need locality
for `HadoopConfigurable` are likely not using HDFS anyway and will fail the
check, and they can also use `locality` option to override the choice to turn
it off.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]