+1 on the exporter tool idea. On Mon, Nov 11, 2019 at 10:36 PM vino yang <yanghua1...@gmail.com> wrote:
> Hi Shiyan, > > +1 for this proposal, Also, it looks like an exporter tool. > > @Vinoth Chandar <vin...@apache.org> Any thoughts about where to place it? > > Best, > Vino > > Vinoth Chandar <vin...@apache.org> 于2019年11月12日周二 上午8:58写道: > > > We can wait for others to chime in as well. :) > > > > On Mon, Nov 11, 2019 at 4:37 PM Shiyan Xu <xu.shiyan.raym...@gmail.com> > > wrote: > > > > > Yes, Vinoth, you're right that it is more of an exporter, which > exports a > > > snapshot from Hudi dataset. > > > > > > It should support MOR too; it shall just leverage on existing > > > SnapshotCopier logic to find the latest file slices. > > > > > > So is it good to create a RFC for further discussion? > > > > > > > > > On Mon, Nov 11, 2019 at 4:31 PM Vinoth Chandar <vin...@apache.org> > > wrote: > > > > > > > What you suggest sounds more like an `Exporter` tool? I imagine you > > will > > > > support MOR as well? +1 on the idea itself. It could be useful if > > plain > > > > parquet snapshot was generated as a backup. > > > > > > > > On Mon, Nov 11, 2019 at 4:21 PM Shiyan Xu < > xu.shiyan.raym...@gmail.com > > > > > > > wrote: > > > > > > > > > Hi All, > > > > > > > > > > The existing SnapshotCopier under Hudi Utilities is a Hudi-to-Hudi > > copy > > > > and > > > > > primarily for backup purpose. > > > > > > > > > > I would like to start a RFC for a more generic Hudi snapshotter, > > which > > > > > > > > > > - Supports existing SnapshotCopier features > > > > > - Add option to export a Hudi dataset to plain parquet files > > > > > - output latest records via Spark dataframe writer > > > > > - remove Hudi metadata fields > > > > > - support custom repartition requirements > > > > > > > > > > Is this a good idea to start an RFC? > > > > > > > > > > Thank you. > > > > > > > > > > Regards, > > > > > Raymond Xu > > > > > > > > > > > > > > >