Hi Kumar Vishal, I couldn't get Fig. of the file format, could you re-upload them? Thanks.
Best Regards On Tue, Nov 1, 2016 at 7:12 PM, Kumar Vishal <kumarvishal1...@gmail.com> wrote: > > Hello All, > > Improving carbon first time query performance > > Reason: > 1. As file system cache is cleared file reading will make it slower to > read and cache > 2. In first time query carbon will have to read the footer from file data > file to form the btree > 3. Carbon reading more footer data than its required(data chunk) > 4. There are lots of random seek is happening in carbon as column > data(data page, rle, inverted index) are not stored together. > > Solution: > 1. Improve block loading time. This can be done by removing data chunk > from blockletInfo and storing only offset and length of data chunk > 2. compress presence meta bitset stored for null values for measure column > using snappy > 3. Store the metadata and data of a column together and read together this > reduces random seek and improve IO > > For this I am planing to change the carbondata thrift format > > *Old format* > > > > *New format* > > > > ** > > Please vote and comment for this new format change > > -Regards > Kumar Vishal > > > >