[GitHub] [iceberg] rdblue opened a new pull request #3089: Spark: Create metadata tables directly to preserve custom FileIO.

GitBox Wed, 08 Sep 2021 15:53:55 -0700


rdblue opened a new pull request #3089:
URL: https://github.com/apache/iceberg/pull/3089



   This updates Spark 3 to create metadata tables directly inside of actions, 
instead of loading them through a SparkCatalog.
   
   This avoids a problem where metadata tables used by the expire snapshots 
action were using `HadoopFileIO` instead of a custom `FileIO` implementation. 
The problem happened in the expire snapshots action, where metadata tables are 
used for file reachability datasets, but are based on a `StaticTableOperations` 
that points directly to a metadata file path. `SparkTableUtil` would create the 
metadata table by translating the table back to an identifier and loading it. 
For static tables, the identifier is the location of the metadata file. This 
causes Spark to load the metadata table using HadoopTables instead of a 
catalog, which then uses `HadoopFileIO`.
   
   The solution is to construct a metadata table directly from the `Table` 
instance passed into `SparkTableUtil` for Spark 3.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue opened a new pull request #3089: Spark: Create metadata tables directly to preserve custom FileIO.

Reply via email to