[GitHub] [iceberg] kbendick commented on a change in pull request #4397: Add support for listing partition recursively during the table migration

GitBox Fri, 25 Mar 2022 21:12:29 -0700


kbendick commented on a change in pull request #4397:
URL: https://github.com/apache/iceberg/pull/4397#discussion_r835711440




##########
File path: data/src/main/java/org/apache/iceberg/data/TableMigrationUtil.java
##########
@@ -133,6 +132,20 @@ private TableMigrationUtil() {
     }
   }
 
+  private static List<FileStatus> listStatus(Configuration conf, String 
partitionUri) throws IOException {

Review comment:
       I don't love the idea of being recursive by default. We already know 
that the cost of listing in these situations is relatively high. Making them 
recursive by default seems like it would only make the relatively expensive 
list potentially more expensive.
   
   But my biggest opposition is that I'm not sure about keeping recursive 
directories when importing the table to Iceberg. I guess if the metadata files 
have the correct file paths in them, then it's fine? I still worry there are 
spots we would miss by importing recursively though.
   
   So I'm not opposed to adding it, but I would like for it to be opt-in.
   
   For spark, there's also a config for reading file based dataframes with 
recursive lookup: `recursiveFileLookup`. 
https://spark.apache.org/docs/latest/sql-data-sources-generic-options.html#recursive-file-lookup
   
   I know this is in `data` and not `spark`, but if we decide not to use some 
of the Hive based names, that might be a decent configuration option name.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] kbendick commented on a change in pull request #4397: Add support for listing partition recursively during the table migration

Reply via email to