Yes, Vinoth, you're right that it is more of an exporter, which exports a
snapshot from Hudi dataset.

It should support MOR too; it shall just leverage on existing
SnapshotCopier logic to find the latest file slices.

So is it good to create a RFC for further discussion?


On Mon, Nov 11, 2019 at 4:31 PM Vinoth Chandar <vin...@apache.org> wrote:

> What you suggest sounds more like an `Exporter` tool?  I imagine you will
> support MOR as well?  +1 on the idea itself. It could be useful if plain
> parquet snapshot was generated as a backup.
>
> On Mon, Nov 11, 2019 at 4:21 PM Shiyan Xu <xu.shiyan.raym...@gmail.com>
> wrote:
>
> > Hi All,
> >
> > The existing SnapshotCopier under Hudi Utilities is a Hudi-to-Hudi copy
> and
> > primarily for backup purpose.
> >
> > I would like to start a RFC for a more generic Hudi snapshotter, which
> >
> >    - Supports existing SnapshotCopier features
> >    - Add option to export a Hudi dataset to plain parquet files
> >       - output latest records via Spark dataframe writer
> >       - remove Hudi metadata fields
> >       - support custom repartition requirements
> >
> > Is this a good idea to start an RFC?
> >
> > Thank you.
> >
> > Regards,
> > Raymond Xu
> >
>

Reply via email to