Pucheng,

For cherry-pick, we've only implemented the operations that we know can be
safely cherry-picked without knowing more context about the operation.
Right now, those are cases where the operation is actually a fast-forward
(not actually a cherry-pick), and append that only adds new data, or a
dynamic partition overwrite where we can apply the same partition
replacement logic because the commit is idempotent partition replacement.

I think the reason why we didn't add delete to that list is that you can
configure deletes in ways that can't necessarily be cherry-picked. For
example, if I have an unpartitioned table and I run `DELETE FROM t WHERE ts
< TIMESTAMP '2023-07-20T15:36:17.497811'`. That delete could remove whole
data files using column ranges. But if we were to cherry-pick the changes,
we would need to know that picking the commit does the same thing as
running that SQL again. Currently, we can't know that because we don't
store the filter used to run the delete anywhere. As a result, we can't
know whether it is safe to pick the changes or if the delete would have
removed additional data files.

To fix this, I think we just need to add the delete filter to the snapshot
so that we can re-run it to validate the result would be the same. Then we
can implement cherry-pick for delete operations.

Ryan

On Thu, Jul 20, 2023 at 3:10 PM Pucheng Yang <[email protected]>
wrote:

> Hi community,
>
> I have a table that has the history below:
>
> null -> s1: overwrite (partition1) -> s2: overwrite (partition2) ->
> s3(current): delete (partition1).
>
> I want to undo the commit that generates s3 because it is a bad commit,
> and my goal is to have a history like below:
>
> null -> s1: overwrite (partition1) -> s3(current): delete (partition1).
>
> In order to do so I :
> 1. reset current snapshot to snapshot s1
> 2. cherry-pick snapshot s3
>
> This seems not allowed and has the below error "Cannot cherry-pick
> snapshot xxx: not append, dynamic overwrite, or fast-forward".
>
> May I know why this is not allowed? or is this theoretical supported but
> just the implementation is not there yet?
>
> Best,
> Pucheng
>


-- 
Ryan Blue
Tabular

Reply via email to