Re: [Discussion] Please vote and comment for carbon data file format change

Kumar Vishal Tue, 29 Nov 2016 05:45:30 -0800

Hi Jihong Ma,
Please find the attachment.

-Regards
Kumar Vishal


On Fri, Nov 4, 2016 at 12:16 AM, Jihong Ma <[email protected]> wrote:

> Hi Kumar,
>
> Please place the proposed format changes in attachment or attach to the
> associated JIRA, I would like to take a look.
>
> Thanks!
>
> Jihong
>
> -----Original Message-----
> From: Jacky Li [mailto:[email protected]]
> Sent: Thursday, November 03, 2016 7:54 AM
> To: [email protected]
> Subject: Re: [Discussion] Please vote and comment for carbon data file
> format change
>
> The proposed change is reasonable, +1.
> But is there a plan to make the reader backward compatible with the old
> format? So the impact to the current deployment is minimum.
>
> Regards,
> Jacky
>
> > 在 2016年11月2日，上午12:38，Kumar Vishal <[email protected]> 写道：
> >
> >  Hi Xiaoqiao He,
> >
> > Please find the attachment.
> >
> > -Regards
> > Kumar Vishal
> >
> > On Tue, Nov 1, 2016 at 9:27 PM, Xiaoqiao He <[email protected]
> <mailto:[email protected]>> wrote:
> > Hi Kumar Vishal,
> >
> > I couldn't get Fig. of the file format, could you re-upload them?
> > Thanks.
> >
> > Best Regards
> >
> > On Tue, Nov 1, 2016 at 7:12 PM, Kumar Vishal <[email protected]
> <mailto:[email protected]>>
> > wrote:
> >
> > >
> > > Hello All,
> > >
> > > Improving carbon first time query performance
> > >
> > > Reason:
> > > 1. As file system cache is cleared file reading will make it slower to
> > > read and cache
> > > 2. In first time query carbon will have to read the footer from file
> data
> > > file to form the btree
> > > 3. Carbon reading more footer data than its required(data chunk)
> > > 4. There are lots of random seek is happening in carbon as column
> > > data(data page, rle, inverted index) are not stored together.
> > >
> > > Solution:
> > > 1. Improve block loading time. This can be done by removing data chunk
> > > from blockletInfo and storing only offset and length of data chunk
> > > 2. compress presence meta bitset stored for null values for measure
> column
> > > using snappy
> > > 3. Store the metadata and data of a column together and read together
> this
> > > reduces random seek and improve IO
> > >
> > > For this I am planing to change the carbondata thrift format
> > >
> > > *Old format*
> > >
> > >
> > >
> > > *New format*
> > >
> > >
> > >
> > > **
> > >
> > > Please vote and comment for this new format change
> > >
> > > -Regards
> > > Kumar Vishal
> > >
> > >
> > >
> > >
> >
>
>

Re: [Discussion] Please vote and comment for carbon data file format change

Reply via email to