[GitHub] [iceberg] aokolnychyi opened a new issue #1306: Spark 3: Consider providing better support for path-based tables

GitBox Fri, 07 Aug 2020 11:24:26 -0700


aokolnychyi opened a new issue #1306:
URL: https://github.com/apache/iceberg/issues/1306



   In Spark 3, support for path-based tables is limited. In particular, I don't 
see a way to create a table at a given location through Spark. Users have to 
use the Iceberg API for that. 
   
   I see a lot of use cases where tables are persisted in a location and there 
is no metastore. Usually, these are HDFS use cases. While we can leverage 
`HadoopCatalog` for such cases, it has its own drawbacks: list operations to 
find a table and what is even more important it requires a special storage 
layout. The latter point is important as we cannot use `HadoopCatalog` for 
path-based tables that were migrated to Iceberg. I want Iceberg to support 
migration of path-based as well as metastore-based tables through SQL 
extensions.
   
   I'd consider adding support to our Spark catalogs to create/load a table 
using a table path as an identifier.
   
   For example, ```CREATE TABLE `path/to/table` USING iceberg``` or ```SELECT * 
FROM `path/to/table` WHERE pred```.  


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] aokolnychyi opened a new issue #1306: Spark 3: Consider providing better support for path-based tables

Reply via email to