+1  this modification will help all the scenario

Kumar Vishal wrote
> ​Hello All,
> 
> Improving carbon first time query performance
> 
> Reason:
> 1. As file system cache is cleared file reading will make it slower to
> read
> and cache
> 2. In first time query carbon will have to read the footer from file data
> file to form the btree
> 3. Carbon reading more footer data than its required(data chunk)
> 4. There are lots of random seek is happening in carbon as column
> data(data
> page, rle, inverted index) are not stored together.
> 
> Solution:
> 1. Improve block loading time. This can be done by removing data chunk
> from
> blockletInfo and storing only offset and length of data chunk
> 2. compress presence meta bitset stored for null values for measure column
> using snappy
> 3. Store the metadata and data of a column together and read together this
> reduces random seek and improve IO
> 
> For this I am planing to change the carbondata thrift format
> 
> *Old format*
> 
> 
> 
> *New format*
> 
> 
> 
> *​*
> 
> Please vote and comment for this new format change
> 
> -Regards
> Kumar Vishal





--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Discussion-Please-vote-and-comment-for-carbon-data-file-format-change-tp2491p4049.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Reply via email to