Re: [DISCUSS] RFC - 08 : Record level indexing mechanisms for Hudi datasets

nishith agarwal Mon, 24 Feb 2020 15:32:01 -0800

+100
- Reduces index lookup time hence improves job runtime
- Paves the way for streaming style ingestion
- Eliminates dependency on Hbase (alternate "global index" support at the
moment)


-Nishith

On Mon, Feb 24, 2020 at 10:56 AM Vinoth Chandar <[email protected]> wrote:

> +1 from me as well. This will be a product defining feature, if we can do
> it/
>
> On Sun, Feb 23, 2020 at 6:27 PM vino yang <[email protected]> wrote:
>
> > Hi Sivabalan,
> >
> > Thanks for your proposal.
> >
> > Big +1 from my side, indexing for record granularity is really good for
> > performance. It is also towards the streaming processing.
> >
> > Best,
> > Vino
> >
> > Sivabalan <[email protected]> 于2020年2月23日周日 上午12:52写道：
> >
> > > As Aapche Hudi is getting widely adopted, performance has become the
> need
> > > of the hour. This RFC focusses on improving performance of the Hudi
> index
> > > by introducing record level index. The proposal is to implement a new
> > index
> > > format that is a mapping of (recordKey <-> partition, fileId) or
> > > ((recordKey, partitionPath) → fileId). This mapping will be stored and
> > > maintained by Hudi as another implementation of HoodieIndex. This
> record
> > > level indexing will definitely give a boost to both read and write
> > > performance.
> > >
> > > Here
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+08+%3A+Record+level+indexing+mechanisms+for+Hudi+datasets
> > > >
> > > is the link to RFC.
> > >
> > > Appreciate your review and thoughts.
> > >
> > > --
> > > Regards,
> > > -Sivabalan
> > >
> >
>

Re: [DISCUSS] RFC - 08 : Record level indexing mechanisms for Hudi datasets

Reply via email to