Hi All, Please find the JIRA issue which I have raised for above discussion.
https://issues.apache.org/jira/browse/CARBONDATA-458 -Regards Kumar Vishal On Tue, Nov 29, 2016 at 7:14 PM, Kumar Vishal <[email protected]> wrote: > Hi Jihong Ma, > Please find the attachment. > > -Regards > Kumar Vishal > > On Fri, Nov 4, 2016 at 12:16 AM, Jihong Ma <[email protected]> wrote: > >> Hi Kumar, >> >> Please place the proposed format changes in attachment or attach to the >> associated JIRA, I would like to take a look. >> >> Thanks! >> >> Jihong >> >> -----Original Message----- >> From: Jacky Li [mailto:[email protected]] >> Sent: Thursday, November 03, 2016 7:54 AM >> To: [email protected] >> Subject: Re: [Discussion] Please vote and comment for carbon data file >> format change >> >> The proposed change is reasonable, +1. >> But is there a plan to make the reader backward compatible with the old >> format? So the impact to the current deployment is minimum. >> >> Regards, >> Jacky >> >> > 在 2016年11月2日,上午12:38,Kumar Vishal <[email protected]> 写道: >> > >> > Hi Xiaoqiao He, >> > >> > Please find the attachment. >> > >> > -Regards >> > Kumar Vishal >> > >> > On Tue, Nov 1, 2016 at 9:27 PM, Xiaoqiao He <[email protected] >> <mailto:[email protected]>> wrote: >> > Hi Kumar Vishal, >> > >> > I couldn't get Fig. of the file format, could you re-upload them? >> > Thanks. >> > >> > Best Regards >> > >> > On Tue, Nov 1, 2016 at 7:12 PM, Kumar Vishal <[email protected] >> <mailto:[email protected]>> >> > wrote: >> > >> > > >> > > Hello All, >> > > >> > > Improving carbon first time query performance >> > > >> > > Reason: >> > > 1. As file system cache is cleared file reading will make it slower to >> > > read and cache >> > > 2. In first time query carbon will have to read the footer from file >> data >> > > file to form the btree >> > > 3. Carbon reading more footer data than its required(data chunk) >> > > 4. There are lots of random seek is happening in carbon as column >> > > data(data page, rle, inverted index) are not stored together. >> > > >> > > Solution: >> > > 1. Improve block loading time. This can be done by removing data chunk >> > > from blockletInfo and storing only offset and length of data chunk >> > > 2. compress presence meta bitset stored for null values for measure >> column >> > > using snappy >> > > 3. Store the metadata and data of a column together and read together >> this >> > > reduces random seek and improve IO >> > > >> > > For this I am planing to change the carbondata thrift format >> > > >> > > *Old format* >> > > >> > > >> > > >> > > *New format* >> > > >> > > >> > > >> > > ** >> > > >> > > Please vote and comment for this new format change >> > > >> > > -Regards >> > > Kumar Vishal >> > > >> > > >> > > >> > > >> > >> >> >
