wypoon edited a comment on pull request #3269:
URL: https://github.com/apache/iceberg/pull/3269#issuecomment-960339408


   @jackye1995 I have not been following the proposal/project to add snapshot 
tagging and branching, so I had not considered this question. I only just read 
the proposal 
[document](https://docs.google.com/document/d/1PvxK_0ebEoX3s7nS6-LOJJZdBYr_olTWH9oepNUfJ-A/edit)
 and your doc PR #3425. I did not see any discussion in them about the 
semantics of time travel by timestamp. What do you think the semantics should 
be? One possible option is:
   1. No branch is specified (I'm not sure if that is possible, but if it is), 
we default to the main branch. Time travel behaves as though only the main 
branch exists.
   2. A branch is specified. We go up the tree from the head of the branch, 
until we are at or before the timestamp. We select the snapshot we find there. 
There are two ways this might fail: a. We pass the beginning of the branch -- 
we can either say there is no snapshot or we can continue on the tree up 
another branch our branch is branched from. b. We do not find any snapshot at 
or before the timestamp -- we say there is no snapshot.
   
   This assumes the history is really a tree. (Am I mistaken about it being a 
tree?)
   
   In addition, we need a syntax for specifying a tag or a branch; I don't know 
if one is proposed -- I saw above an allusion to using `.` for specifying 
branch and tag as well. Or will we support a way to say "SET BRANCH `<branch>`" 
in SQL or in the DataFrame API using `.option("branch", <branch>)`? It sounds 
like we would, since you consider the option to "first switch the current 
branch to point to a different branch".
   
   Do you think we need the `org.apache.spark.sql.connector.catalog.Identifier` 
to have enough information to load a table using the necessary 
branch/tag/snapshot-id? I think the answer depends on what syntax we intend to 
support for specifying branch and tag. Here are some things I can think of:
   1. Support using a tag instead of a snapshot id in the VERSION AS OF clause: 
`SELECT * from t VERSION AS OF <tag>`. Is that feasible technically, or do we 
need a different variant of AS OF?
   2. Support specifying a tag or a branch as part of the table identifier: 
`SELECT * from t.<tag>` or `SELECT * FROM t.<branch>` or `SELECT * from 
t.<branch> TIMESTAMP AS OF <timestamp>`. (Here I'm using `.` just for 
illustration.)
   
   I think that if we use the `.` syntax, then we do not need to pass a branch 
or tag field in the `Identifier`. We would resolve the branch/tag on the 
Iceberg side along the lines shown in this PR for snapshot-id. But we do need 
to pass the snapshot-id or timestamp that comes from the AS OF clause in the 
`Identifier`.
   @huaxingao without considering the branch/tag question, do you think we need 
the ability to pass either snapshot-id or timestamp in the `Identifier`? How do 
you propose to distinguish the two?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to