Hi Shiyan, +1 for this proposal, Also, it looks like an exporter tool.
@Vinoth Chandar <vin...@apache.org> Any thoughts about where to place it? Best, Vino Vinoth Chandar <vin...@apache.org> 于2019年11月12日周二 上午8:58写道: > We can wait for others to chime in as well. :) > > On Mon, Nov 11, 2019 at 4:37 PM Shiyan Xu <xu.shiyan.raym...@gmail.com> > wrote: > > > Yes, Vinoth, you're right that it is more of an exporter, which exports a > > snapshot from Hudi dataset. > > > > It should support MOR too; it shall just leverage on existing > > SnapshotCopier logic to find the latest file slices. > > > > So is it good to create a RFC for further discussion? > > > > > > On Mon, Nov 11, 2019 at 4:31 PM Vinoth Chandar <vin...@apache.org> > wrote: > > > > > What you suggest sounds more like an `Exporter` tool? I imagine you > will > > > support MOR as well? +1 on the idea itself. It could be useful if > plain > > > parquet snapshot was generated as a backup. > > > > > > On Mon, Nov 11, 2019 at 4:21 PM Shiyan Xu <xu.shiyan.raym...@gmail.com > > > > > wrote: > > > > > > > Hi All, > > > > > > > > The existing SnapshotCopier under Hudi Utilities is a Hudi-to-Hudi > copy > > > and > > > > primarily for backup purpose. > > > > > > > > I would like to start a RFC for a more generic Hudi snapshotter, > which > > > > > > > > - Supports existing SnapshotCopier features > > > > - Add option to export a Hudi dataset to plain parquet files > > > > - output latest records via Spark dataframe writer > > > > - remove Hudi metadata fields > > > > - support custom repartition requirements > > > > > > > > Is this a good idea to start an RFC? > > > > > > > > Thank you. > > > > > > > > Regards, > > > > Raymond Xu > > > > > > > > > >