Thanks Ryan, my organization was trying to experiment with something that requires this. But after today's discussion, we decided not to pursue it. But thanks for the examples, it really helps by the time we need to come back to this topic in the future.
On Thu, Jul 20, 2023 at 3:40 PM Ryan Blue <[email protected]> wrote: > Pucheng, > > For cherry-pick, we've only implemented the operations that we know can be > safely cherry-picked without knowing more context about the operation. > Right now, those are cases where the operation is actually a fast-forward > (not actually a cherry-pick), and append that only adds new data, or a > dynamic partition overwrite where we can apply the same partition > replacement logic because the commit is idempotent partition replacement. > > I think the reason why we didn't add delete to that list is that you can > configure deletes in ways that can't necessarily be cherry-picked. For > example, if I have an unpartitioned table and I run `DELETE FROM t WHERE ts > < TIMESTAMP '2023-07-20T15:36:17.497811'`. That delete could remove whole > data files using column ranges. But if we were to cherry-pick the changes, > we would need to know that picking the commit does the same thing as > running that SQL again. Currently, we can't know that because we don't > store the filter used to run the delete anywhere. As a result, we can't > know whether it is safe to pick the changes or if the delete would have > removed additional data files. > > To fix this, I think we just need to add the delete filter to the snapshot > so that we can re-run it to validate the result would be the same. Then we > can implement cherry-pick for delete operations. > > Ryan > > On Thu, Jul 20, 2023 at 3:10 PM Pucheng Yang <[email protected]> > wrote: > >> Hi community, >> >> I have a table that has the history below: >> >> null -> s1: overwrite (partition1) -> s2: overwrite (partition2) -> >> s3(current): delete (partition1). >> >> I want to undo the commit that generates s3 because it is a bad commit, >> and my goal is to have a history like below: >> >> null -> s1: overwrite (partition1) -> s3(current): delete (partition1). >> >> In order to do so I : >> 1. reset current snapshot to snapshot s1 >> 2. cherry-pick snapshot s3 >> >> This seems not allowed and has the below error "Cannot cherry-pick >> snapshot xxx: not append, dynamic overwrite, or fast-forward". >> >> May I know why this is not allowed? or is this theoretical supported but >> just the implementation is not there yet? >> >> Best, >> Pucheng >> > > > -- > Ryan Blue > Tabular >
