kbendick commented on a change in pull request #4397:
URL: https://github.com/apache/iceberg/pull/4397#discussion_r835711440
##########
File path: data/src/main/java/org/apache/iceberg/data/TableMigrationUtil.java
##########
@@ -133,6 +132,20 @@ private TableMigrationUtil() {
}
}
+ private static List<FileStatus> listStatus(Configuration conf, String
partitionUri) throws IOException {
Review comment:
I don't love the idea of being recursive by default. We already know
that the cost of listing in these situations is relatively high. Making them
recursive by default seems like it would only make the relatively expensive
list potentially more expensive.
But my biggest opposition is that I'm not sure about keeping recursive
directories when importing the table to Iceberg. I guess if the metadata files
have the correct file paths in them, then it's fine? I still worry there are
spots we would miss by importing recursively though.
So I'm not opposed to adding it, but I would like for it to be opt-in.
For spark, there's also a config for reading file based dataframes with
recursive lookup: `recursiveFileLookup`.
https://spark.apache.org/docs/latest/sql-data-sources-generic-options.html#recursive-file-lookup
I know this is in `data` and not `spark`, but if we decide not to use some
of the Hive based names, that might be a decent configuration option name.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]