Hi Jihong Ma, Please find the attachment. -Regards Kumar Vishal
On Fri, Nov 4, 2016 at 12:16 AM, Jihong Ma <[email protected]> wrote: > Hi Kumar, > > Please place the proposed format changes in attachment or attach to the > associated JIRA, I would like to take a look. > > Thanks! > > Jihong > > -----Original Message----- > From: Jacky Li [mailto:[email protected]] > Sent: Thursday, November 03, 2016 7:54 AM > To: [email protected] > Subject: Re: [Discussion] Please vote and comment for carbon data file > format change > > The proposed change is reasonable, +1. > But is there a plan to make the reader backward compatible with the old > format? So the impact to the current deployment is minimum. > > Regards, > Jacky > > > 在 2016年11月2日,上午12:38,Kumar Vishal <[email protected]> 写道: > > > > Hi Xiaoqiao He, > > > > Please find the attachment. > > > > -Regards > > Kumar Vishal > > > > On Tue, Nov 1, 2016 at 9:27 PM, Xiaoqiao He <[email protected] > <mailto:[email protected]>> wrote: > > Hi Kumar Vishal, > > > > I couldn't get Fig. of the file format, could you re-upload them? > > Thanks. > > > > Best Regards > > > > On Tue, Nov 1, 2016 at 7:12 PM, Kumar Vishal <[email protected] > <mailto:[email protected]>> > > wrote: > > > > > > > > Hello All, > > > > > > Improving carbon first time query performance > > > > > > Reason: > > > 1. As file system cache is cleared file reading will make it slower to > > > read and cache > > > 2. In first time query carbon will have to read the footer from file > data > > > file to form the btree > > > 3. Carbon reading more footer data than its required(data chunk) > > > 4. There are lots of random seek is happening in carbon as column > > > data(data page, rle, inverted index) are not stored together. > > > > > > Solution: > > > 1. Improve block loading time. This can be done by removing data chunk > > > from blockletInfo and storing only offset and length of data chunk > > > 2. compress presence meta bitset stored for null values for measure > column > > > using snappy > > > 3. Store the metadata and data of a column together and read together > this > > > reduces random seek and improve IO > > > > > > For this I am planing to change the carbondata thrift format > > > > > > *Old format* > > > > > > > > > > > > *New format* > > > > > > > > > > > > ** > > > > > > Please vote and comment for this new format change > > > > > > -Regards > > > Kumar Vishal > > > > > > > > > > > > > > > >
