Came up with the first draft. Thank you. https://cwiki.apache.org/confluence/display/HUDI/RFC-9%3A+%28WIP%29+Hudi+Dataset+Snapshotter
On Tue, Nov 12, 2019 at 12:44 PM Shiyan Xu <[email protected]> wrote: > Thank you all for the +1s! I'll go ahead add a RFC page then. > > On Tue, Nov 12, 2019 at 8:41 AM nishith agarwal <[email protected]> > wrote: > >> +1 on the exporter tool idea. >> >> -Nishith >> >> On Tue, Nov 12, 2019 at 5:06 AM leesf <[email protected]> wrote: >> >> > +1. and we would discuss it further when design docs are available. >> > >> > Best, >> > Leesf >> > >> > Balaji Varadarajan <[email protected]> 于2019年11月12日周二 下午4:17写道: >> > >> > > +1 on the exporter tool idea. >> > > >> > > On Mon, Nov 11, 2019 at 10:36 PM vino yang <[email protected]> >> > wrote: >> > > >> > > > Hi Shiyan, >> > > > >> > > > +1 for this proposal, Also, it looks like an exporter tool. >> > > > >> > > > @Vinoth Chandar <[email protected]> Any thoughts about where to >> place >> > > it? >> > > > >> > > > Best, >> > > > Vino >> > > > >> > > > Vinoth Chandar <[email protected]> 于2019年11月12日周二 上午8:58写道: >> > > > >> > > > > We can wait for others to chime in as well. :) >> > > > > >> > > > > On Mon, Nov 11, 2019 at 4:37 PM Shiyan Xu < >> > [email protected] >> > > > >> > > > > wrote: >> > > > > >> > > > > > Yes, Vinoth, you're right that it is more of an exporter, which >> > > > exports a >> > > > > > snapshot from Hudi dataset. >> > > > > > >> > > > > > It should support MOR too; it shall just leverage on existing >> > > > > > SnapshotCopier logic to find the latest file slices. >> > > > > > >> > > > > > So is it good to create a RFC for further discussion? >> > > > > > >> > > > > > >> > > > > > On Mon, Nov 11, 2019 at 4:31 PM Vinoth Chandar < >> [email protected]> >> > > > > wrote: >> > > > > > >> > > > > > > What you suggest sounds more like an `Exporter` tool? I >> imagine >> > > you >> > > > > will >> > > > > > > support MOR as well? +1 on the idea itself. It could be >> useful >> > if >> > > > > plain >> > > > > > > parquet snapshot was generated as a backup. >> > > > > > > >> > > > > > > On Mon, Nov 11, 2019 at 4:21 PM Shiyan Xu < >> > > > [email protected] >> > > > > > >> > > > > > > wrote: >> > > > > > > >> > > > > > > > Hi All, >> > > > > > > > >> > > > > > > > The existing SnapshotCopier under Hudi Utilities is a >> > > Hudi-to-Hudi >> > > > > copy >> > > > > > > and >> > > > > > > > primarily for backup purpose. >> > > > > > > > >> > > > > > > > I would like to start a RFC for a more generic Hudi >> > snapshotter, >> > > > > which >> > > > > > > > >> > > > > > > > - Supports existing SnapshotCopier features >> > > > > > > > - Add option to export a Hudi dataset to plain parquet >> files >> > > > > > > > - output latest records via Spark dataframe writer >> > > > > > > > - remove Hudi metadata fields >> > > > > > > > - support custom repartition requirements >> > > > > > > > >> > > > > > > > Is this a good idea to start an RFC? >> > > > > > > > >> > > > > > > > Thank you. >> > > > > > > > >> > > > > > > > Regards, >> > > > > > > > Raymond Xu >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> >
