Hi,Jingsong and zelin,

My opinion is as follows:
We must ensure that there is at least one complete snapshot in the target
table after the clone procedure is finished.

> For cloning specified snapshot or tag, undoubtedly, rollback operation
(deleting copied files) and an exception should be thrown.

My opinion is exactly the same as Jingsong.



>  For cloning all snapshots and tags, we should ignore deleted files to
keep this clone working. To avoid conflicting with expiring snapshots and
deleting files in streaming writing job.

We need to discuss whether it is possible for the following corner case to
occur.
    1. There are three snapshots(snapshot-1, snapshot-2, snapshot-3) at the
beginning of the source table.
    2. We start a clone procedure. All files belonging to
snapshots(snapshot-1, snapshot-2, snapshot-3) are selected.
    3. Start a flink batch job to copy files.
    4. In streaming writing job, commit snapshot-4, snapshot-5, snapshot-6.
    5. The snapshot-3 hit the snapshot expire logic and some files of
snapshot-3 are deleted.
    6. The flink batch job was executed for a long time due to cluster
environment and other factors. Now it finished and ignore FileNotFound
exception.
    7. Finally there is no complete snapshot in the target table.
Whether it is possible for the corner case to occur? Let discuss it.



On Wed, Apr 3, 2024 at 3:13 PM Jingsong Li <[email protected]> wrote:

> > I want to know that if in the clone procedure, the specified snapshot or
> tag is being deleted, how do we handle the exception?
> Should we stop the procedure and clean the temporary target table
> directory?
>
> - For cloning specified snapshot or tag, undoubtedly, rollback
> operation (deleting copied files) and an exception should be thrown.
>
> - For cloning all snapshots and tags, we should ignore deleted files
> to keep this clone working. To avoid conflicting with expiring
> snapshots and deleting files in streaming writing job.
>
> Best,
> Jingsong
>
> On Wed, Apr 3, 2024 at 3:08 PM yu zelin <[email protected]> wrote:
> >
> > Hi Jingsong,
> >
> > I want to know that if in the clone procedure, the specified snapshot or
> > tag is being deleted, how do we handle the exception?
> > Should we stop the procedure and clean the temporary target table
> directory?
> >
> > Best regards,
> > Zelin Yu
> >
> > On Mon, Mar 18, 2024 at 1:30 PM Jingsong Li <[email protected]>
> wrote:
> >
> > > Hi devs,
> > >
> > > I have heard many times that there is a need to copy the entire table,
> > > and my advice to them is often to use file system file copying.
> > >
> > > But there are a few issues:
> > > 1. It is necessary to copy a large number of files, and it is likely
> > > that some files will be deleted due to ongoing work, resulting in
> > > copying failure.
> > > 2. The target table may need to synchronize Hive metadata, which means
> > > using HiveCatalog, which cannot be solved by copying files.
> > >
> > > So I suggest we have a clone procedure. [1]
> > >
> > > Also, welcome contributors to develop this PIP together, and I will
> > > help you review your code.
> > >
> > > [1]
> > >
> https://cwiki.apache.org/confluence/display/PAIMON/PIP-18%3A+Introduce+clone+Procedure
> > >
> > > Best,
> > > Jingsong
> > >
>

Reply via email to