[GitHub] [iceberg] rdblue commented on pull request #6651: Spark 3.3 write to branch snapshot

via GitHub Fri, 24 Feb 2023 09:15:31 -0800


rdblue commented on PR #6651:
URL: https://github.com/apache/iceberg/pull/6651#issuecomment-1444062541


   I think that we have a general problem using options for writing to a table 
that we're trying to work around, but in the end it's probably not a good idea 
to use read options for branching. There are a few issues we've already hit:
   
   1. Refs can have different schemas that need to be reported when the 
table/branch is loaded
   2. Reads and writes need to use the same table state, but read and write 
options are not reliable for that. Dynamic pruning, in particular, does not 
copy the options
   3. SQL has no reliable way to set the read or write options
   
   I was talking with @aokolnychyi about this yesterday and I think he's right: 
the best way to solve these problems is to load a branch like we would a table, 
so that all uses of the branch/table are consistent. The read path does this by 
using `VERSION AS OF` and `TIMESTAMP AS OF` to load the table with the correct 
state. To get this working in other situations, I think we should go back to 
the original idea of using the branch name in the table identifier, like 
`catalog.db.table.branch`. While that's not ideal for column references (by 
default columns would be `branch.col` rather than `table.col`) it is the most 
reliable solution for writes, reads, and dynamic pruning.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on pull request #6651: Spark 3.3 write to branch snapshot

Reply via email to