Re: [Discussion] Please vote and comment for carbon data file format change

Jean-Baptiste Onofré Sat, 10 Dec 2016 00:38:48 -0800

+1

Regards
JB⁣


On Dec 10, 2016, 09:33, at 09:33, "bill.zhou" <[email protected]> wrote:
>+1  this modification will help all the scenario
>
>Kumar Vishal wrote
>> Hello All,
>> 
>> Improving carbon first time query performance
>> 
>> Reason:
>> 1. As file system cache is cleared file reading will make it slower
>to
>> read
>> and cache
>> 2. In first time query carbon will have to read the footer from file
>data
>> file to form the btree
>> 3. Carbon reading more footer data than its required(data chunk)
>> 4. There are lots of random seek is happening in carbon as column
>> data(data
>> page, rle, inverted index) are not stored together.
>> 
>> Solution:
>> 1. Improve block loading time. This can be done by removing data
>chunk
>> from
>> blockletInfo and storing only offset and length of data chunk
>> 2. compress presence meta bitset stored for null values for measure
>column
>> using snappy
>> 3. Store the metadata and data of a column together and read together
>this
>> reduces random seek and improve IO
>> 
>> For this I am planing to change the carbondata thrift format
>> 
>> *Old format*
>> 
>> 
>> 
>> *New format*
>> 
>> 
>> 
>> **
>> 
>> Please vote and comment for this new format change
>> 
>> -Regards
>> Kumar Vishal
>
>
>
>
>
>--
>View this message in context:
>http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Discussion-Please-vote-and-comment-for-carbon-data-file-format-change-tp2491p4049.html
>Sent from the Apache CarbonData Mailing List archive mailing list
>archive at Nabble.com.

Re: [Discussion] Please vote and comment for carbon data file format change

Reply via email to