We can wait for others to chime in as well. :) On Mon, Nov 11, 2019 at 4:37 PM Shiyan Xu <xu.shiyan.raym...@gmail.com> wrote:
> Yes, Vinoth, you're right that it is more of an exporter, which exports a > snapshot from Hudi dataset. > > It should support MOR too; it shall just leverage on existing > SnapshotCopier logic to find the latest file slices. > > So is it good to create a RFC for further discussion? > > > On Mon, Nov 11, 2019 at 4:31 PM Vinoth Chandar <vin...@apache.org> wrote: > > > What you suggest sounds more like an `Exporter` tool? I imagine you will > > support MOR as well? +1 on the idea itself. It could be useful if plain > > parquet snapshot was generated as a backup. > > > > On Mon, Nov 11, 2019 at 4:21 PM Shiyan Xu <xu.shiyan.raym...@gmail.com> > > wrote: > > > > > Hi All, > > > > > > The existing SnapshotCopier under Hudi Utilities is a Hudi-to-Hudi copy > > and > > > primarily for backup purpose. > > > > > > I would like to start a RFC for a more generic Hudi snapshotter, which > > > > > > - Supports existing SnapshotCopier features > > > - Add option to export a Hudi dataset to plain parquet files > > > - output latest records via Spark dataframe writer > > > - remove Hudi metadata fields > > > - support custom repartition requirements > > > > > > Is this a good idea to start an RFC? > > > > > > Thank you. > > > > > > Regards, > > > Raymond Xu > > > > > >