Hi Ravi Thank you bringing the discussion to mailing list, i have one question: how to ensure backward-compatible after introducing the new format.
Regards Liang Jean-Baptiste Onofré wrote > Agree. > > +1 > > Regards > JB > > On Feb 15, 2017, 09:09, at 09:09, Kumar Vishal < > kumarvishal1802@ > > wrote: >>+1 >>This will improve the IO bottleneck. Page level min max will improve >>the >>block pruning and less number of false positive blocks will improve the >>filter query performance. Separating uncompression of data from reader >>layer will improve the overall query performance. >> >>-Regards >>Kumar Vishal >> >>On Wed, Feb 15, 2017 at 7:50 PM, Ravindra Pesala >>< > ravi.pesala@ > > >>wrote: >> >>> Please find the thrift file in below location. >>> https://drive.google.com/open?id=0B4TWTVbFSTnqZEdDRHRncVItQ242b >>> 1NqSTU2b2g4dkhkVDRj >>> >>> On 15 February 2017 at 17:14, Ravindra Pesala < > ravi.pesala@ > > >>> wrote: >>> >>> > Problems in current format. >>> > 1. IO read is slower since it needs to go for multiple seeks on the >>file >>> > to read column blocklets. Current size of blocklet is 120000, so it >>needs >>> > to read multiple times from file to scan the data on that column. >>> > Alternatively we can increase the blocklet size but it suffers for >>filter >>> > queries as it gets big blocklet to filter. >>> > 2. Decompression is slower in current format, we are using inverted >>index >>> > for faster filter queries and using NumberCompressor to compress >>the >>> > inverted index in bit wise packing. It becomes slower so we should >>avoid >>> > number compressor. One alternative is to keep blocklet size with in >>32000 >>> > so that inverted index can be written with short, but IO read >>suffers a >>> lot. >>> > >>> > To overcome from above 2 issues we are introducing new format V3. >>> > Here each blocklet has multiple pages with size 32000, number of >>pages in >>> > blocklet is configurable. Since we keep the page with in short >>limit so >>> no >>> > need compress the inverted index here. >>> > And maintain the max/min for each page to further prune the filter >>> queries. >>> > Read the blocklet with pages at once and keep in offheap memory. >>> > During filter first check the max/min range and if it is valid then >>go >>> for >>> > decompressing the page to filter further. >>> > >>> > Please find the attached V3 format thrift file. >>> > >>> > -- >>> > Thanks & Regards, >>> > Ravi >>> > >>> >>> >>> >>> -- >>> Thanks & Regards, >>> Ravi >>> -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Introducing-V3-format-tp7609p7622.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
