Re: Introducing V3 format.

Liang Chen Wed, 15 Feb 2017 14:49:15 -0800

Hi Ravi

Thank you bringing the discussion to mailing list, i have one question: how
to ensure backward-compatible after introducing the new format.


Regards
Liang

Jean-Baptiste Onofré wrote
> Agree.
> 
> +1
> 
> Regards
> JB
> 
> On Feb 15, 2017, 09:09, at 09:09, Kumar Vishal &lt;

> kumarvishal1802@

> &gt; wrote:
>>+1
>>This will improve the IO bottleneck. Page level min max will improve
>>the
>>block pruning and less number of false positive blocks will improve the
>>filter query performance. Separating uncompression of data from reader
>>layer will improve the overall query performance.
>>
>>-Regards
>>Kumar Vishal
>>
>>On Wed, Feb 15, 2017 at 7:50 PM, Ravindra Pesala
>>&lt;

> ravi.pesala@

> &gt;
>>wrote:
>>
>>> Please find the thrift file in below location.
>>> https://drive.google.com/open?id=0B4TWTVbFSTnqZEdDRHRncVItQ242b
>>> 1NqSTU2b2g4dkhkVDRj
>>>
>>> On 15 February 2017 at 17:14, Ravindra Pesala &lt;

> ravi.pesala@

> &gt;
>>> wrote:
>>>
>>> > Problems in current format.
>>> > 1. IO read is slower since it needs to go for multiple seeks on the
>>file
>>> > to read column blocklets. Current size of blocklet is 120000, so it
>>needs
>>> > to read multiple times from file to scan the data on that column.
>>> > Alternatively we can increase the blocklet size but it suffers for
>>filter
>>> > queries as it gets big blocklet to filter.
>>> > 2. Decompression is slower in current format, we are using inverted
>>index
>>> > for faster filter queries and using NumberCompressor to compress
>>the
>>> > inverted index in bit wise packing. It becomes slower so we should
>>avoid
>>> > number compressor. One alternative is to keep blocklet size with in
>>32000
>>> > so that inverted index can be written with short, but IO read
>>suffers a
>>> lot.
>>> >
>>> > To overcome from above 2 issues we are introducing new format V3.
>>> > Here each blocklet has multiple pages with size 32000, number of
>>pages in
>>> > blocklet is configurable. Since we keep the page with in short
>>limit so
>>> no
>>> > need compress the inverted index here.
>>> > And maintain the max/min for each page to further prune the filter
>>> queries.
>>> > Read the blocklet with pages at once and keep in offheap memory.
>>> > During filter first check the max/min range and if it is valid then
>>go
>>> for
>>> > decompressing the page to filter further.
>>> >
>>> > Please find the attached V3 format thrift file.
>>> >
>>> > --
>>> > Thanks & Regards,
>>> > Ravi
>>> >
>>>
>>>
>>>
>>> --
>>> Thanks & Regards,
>>> Ravi
>>>





--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Introducing-V3-format-tp7609p7622.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Re: Introducing V3 format.

Reply via email to