guilload commented on issue #1155:
URL: https://github.com/apache/iceberg/issues/1155#issuecomment-652598657


   I have some thoughts regarding this. Now that the SerDe has been merged, 
there are two pieces of code that do essentially the same thing:
   - 
[IcebergInputFormat.findTable(...)](https://github.com/apache/iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java#L357)
   - 
[TableResolver.resolveTableFromConfiguration(...)](https://github.com/apache/iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/mapred/TableResolver.java#L45)
   
   1. Those two classes should probably be consolidated into one.
   2. The ability to define a custom catalog loader is neat. We actually use 
our own catalog at Airbnb and that feature comes in handy.
   3. At the same, most users are going to use either the default Hadoop or 
Hive catalogs, so let's make it easy for them.
   4. I'm personally not a fan of using  a "table path" configuration property 
to pass either a file path or a table identifier and have to look for the `/` 
character  to know which use is intended.
   
   Roughly, I have something like that in mind:
   
   ```python
   def find_table(conf)
     assert conf.get('table.identifier') is None or conf.get('table.path') is 
None
     assert conf.get('catalog') is None or conf.get('catalog.loader.class') is 
None
   
     if conf.get('table.path'):
       return HadoopTables.load(conf.get('table.path'))
   
     if conf.get('catalog.loader.class'):
       loader = load_class(conf.get('catalog.loader.class'))
       return loader.load(conf)
   
     identifier = conf.get('table.identifier')
   
     if (conf.get('catalog') == 'hadoop'):
       return HadoopCatalog.load(identifier)
     elif (conf.get('catalog') == 'hive'):
       return HiveCatalog.load(identifier)
     else:
       raise ...
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to