rdblue commented on PR #6651: URL: https://github.com/apache/iceberg/pull/6651#issuecomment-1444062541
I think that we have a general problem using options for writing to a table that we're trying to work around, but in the end it's probably not a good idea to use read options for branching. There are a few issues we've already hit: 1. Refs can have different schemas that need to be reported when the table/branch is loaded 2. Reads and writes need to use the same table state, but read and write options are not reliable for that. Dynamic pruning, in particular, does not copy the options 3. SQL has no reliable way to set the read or write options I was talking with @aokolnychyi about this yesterday and I think he's right: the best way to solve these problems is to load a branch like we would a table, so that all uses of the branch/table are consistent. The read path does this by using `VERSION AS OF` and `TIMESTAMP AS OF` to load the table with the correct state. To get this working in other situations, I think we should go back to the original idea of using the branch name in the table identifier, like `catalog.db.table.branch`. While that's not ideal for column references (by default columns would be `branch.col` rather than `table.col`) it is the most reliable solution for writes, reads, and dynamic pruning. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
