guilload commented on issue #1155: URL: https://github.com/apache/iceberg/issues/1155#issuecomment-652598657
I have some thoughts regarding this. Now that the SerDe has been merged, there are two pieces of code that do essentially the same thing: - [IcebergInputFormat.findTable(...)](https://github.com/apache/iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java#L357) - [TableResolver.resolveTableFromConfiguration(...)](https://github.com/apache/iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/mapred/TableResolver.java#L45) 1. Those two classes should probably be consolidated into one. 2. The ability to define a custom catalog loader is neat. We actually use our own catalog at Airbnb and that feature comes in handy. 3. At the same, most users are going to use either the default Hadoop or Hive catalogs, so let's make it easy for them. 4. I'm personally not a fan of using a "table path" configuration property to pass either a file path or a table identifier and have to look for the `/` character to know which use is intended. Roughly, I have something like that in mind: ```python def find_table(conf) assert conf.get('table.identifier') is None or conf.get('table.path') is None assert conf.get('catalog') is None or conf.get('catalog.loader.class') is None if conf.get('table.path'): return HadoopTables.load(conf.get('table.path')) if conf.get('catalog.loader.class'): loader = load_class(conf.get('catalog.loader.class')) return loader.load(conf) identifier = conf.get('table.identifier') if (conf.get('catalog') == 'hadoop'): return HadoopCatalog.load(identifier) elif (conf.get('catalog') == 'hive'): return HiveCatalog.load(identifier) else: raise ... ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
