[GitHub] [iceberg] wypoon commented on pull request #3269: Spark: Support time travel through table names

GitBox Fri, 05 Nov 2021 12:54:22 -0700


wypoon commented on pull request #3269:
URL: https://github.com/apache/iceberg/pull/3269#issuecomment-961362777



   @huaxingao I'm guessing that if there is an AS OF clause in the SQL query, 
you will call `CatalogV2Util.loadTable(CatalogPlugin, Identifier, String)` but 
if there isn't an AS OF clause, `CatalogV2Util.loadTable(CatalogPlugin, 
Identifier)` will be called. Is that right?
   
   The problem is that we need to support using the DataFrame API with  
`.option("snapshot-id", ...)` or `.option("as-of-timestamp", ...)` as well. In 
those cases, `CatalogV2Util.loadTable(CatalogPlugin, Identifier)` will be 
called. In this code path, the `Identifier` is provided by `IcebergSource` 
(which implements `SupportsCatalogOptions`) via `extractIdentifier`. Ideally, 
`Identifier` has a `version` field, so we can set it in the `Identifier` we 
return in `IcebergSource#extractIdentifier`. But if `Identifier` cannot be 
changed to support `version`, then this is less convenient for us. What I did 
previously in earlier iterations of #1508 is to create a 
`SnapshotAwareIdentifier` in Iceberg that extends `Identifier`, and return that 
in `IcebergSource#extractIdentifier`. (Then in 
`SparkCatalog#loadTable(Identifier)` use the `SnapshotAwareIdentifier` to 
identify the snapshot.)
   
   @rdblue what are your thoughts on this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] wypoon commented on pull request #3269: Spark: Support time travel through table names

Reply via email to