> But in this case, we may have to copy all the snapshot for data migration.

We need an API to describe how to copy snapshots.

I will think about this.

Best,
Jingsong

On Wed, Mar 20, 2024 at 4:06 PM Aitozi <[email protected]> wrote:
>
> Hi Jingsong
>      > No, I mean using metastore=hive, using HiveCatalog. I just want to
> clarify that simple file copying cannot fully handle schema migration,
> as the schema should be synchronized to HMS.
>
> So in terms of implementation, the actual workflow like this ?
> 1. pick the file list of one snapshot
> 2. copy  the file to the target table's path
> 3. create the table in table schema
>
> > You can use HiveCatalog here, so that different databases can be on
> different clusters, which can complete data migration across clusters.
>
> Yes. But in this case, we may have to copy all the snapshot for data
> migration.
> So I think maybe we do not have to limit only copy one snapshot as
> described in the doc.
>
> Best,
> Aitozi.
>
>
> Jingsong Li <[email protected]> 于2024年3月20日周三 15:25写道:
>
> > Hi Aitozi,
> >
> > Thanks for your feedback.
> >
> > > 1. Is the `clone` procedure equal to the following process ?
> >      - create the target table in target catalog
> >      - Insert into the target_table select * from source_table
> > /*+OPTIONS('scan.tag-name' = 'tag-1') */;
> >
> > Yes.
> >
> > > 2. What do you mean by
> >      > "The target table may need to synchronize Hive metadata, which
> > means using HiveCatalog, which cannot be solved by copying files."
> >      Does it means clone a paimon table to a hive table ?
> >
> > No, I mean using metastore=hive, using HiveCatalog. I just want to
> > clarify that simple file copying cannot fully handle schema migration,
> > as the schema should be synchronized to HMS.
> >
> > > If clone a paimon table to another paimon table, can we use file copy
> > solution?
> >
> > In terms of implementation, yes, it is file copying, but compared to
> > full directory copying:
> > 1. Copy only partial files.
> > 2. At the same time, tables will also be created in the catalog.
> >
> > > I guess it may be useful when we have to move the data between cluster.
> >
> > You can use HiveCatalog here, so that different databases can be on
> > different clusters, which can complete data migration across clusters.
> >
> > Best,
> > Jingsong
> >
> > On Tue, Mar 19, 2024 at 11:25 PM Aitozi <[email protected]> wrote:
> > >
> > > Hi Jingsong,
> > >
> > >     Clone table is a useful feature, I have two question here.
> > >
> > > 1. Is the `clone` procedure equal to the following process ?
> > >        - create the target table in target catalog
> > >        - Insert into the target_table select * from source_table /*+
> > > OPTIONS('scan.tag-name' = 'tag-1') */;
> > > 2. What do you mean by
> > >
> > > > "The target table may need to synchronize Hive metadata, which means
> > > using HiveCatalog, which cannot be solved by copying files."
> > >
> > > Does it means clone a paimon table to a hive table ?
> > >
> > > If clone a paimon table to another paimon table, can we use file copy
> > > solution?
> > > I guess it may be useful when we have to move the data between cluster.
> > >
> > > Best,
> > > Atiozi
> > >
> > > Jingsong Li <[email protected]> 于2024年3月18日周一 13:30写道:
> > >
> > > > Hi devs,
> > > >
> > > > I have heard many times that there is a need to copy the entire table,
> > > > and my advice to them is often to use file system file copying.
> > > >
> > > > But there are a few issues:
> > > > 1. It is necessary to copy a large number of files, and it is likely
> > > > that some files will be deleted due to ongoing work, resulting in
> > > > copying failure.
> > > > 2. The target table may need to synchronize Hive metadata, which means
> > > > using HiveCatalog, which cannot be solved by copying files.
> > > >
> > > > So I suggest we have a clone procedure. [1]
> > > >
> > > > Also, welcome contributors to develop this PIP together, and I will
> > > > help you review your code.
> > > >
> > > > [1]
> > > >
> > https://cwiki.apache.org/confluence/display/PAIMON/PIP-18%3A+Introduce+clone+Procedure
> > > >
> > > > Best,
> > > > Jingsong
> > > >
> >

Reply via email to