Please find the thrift file in below location. https://drive.google.com/open?id=0B4TWTVbFSTnqZEdDRHRncVItQ242b1NqSTU2b2g4dkhkVDRj
On 15 February 2017 at 17:14, Ravindra Pesala <[email protected]> wrote: > Problems in current format. > 1. IO read is slower since it needs to go for multiple seeks on the file > to read column blocklets. Current size of blocklet is 120000, so it needs > to read multiple times from file to scan the data on that column. > Alternatively we can increase the blocklet size but it suffers for filter > queries as it gets big blocklet to filter. > 2. Decompression is slower in current format, we are using inverted index > for faster filter queries and using NumberCompressor to compress the > inverted index in bit wise packing. It becomes slower so we should avoid > number compressor. One alternative is to keep blocklet size with in 32000 > so that inverted index can be written with short, but IO read suffers a lot. > > To overcome from above 2 issues we are introducing new format V3. > Here each blocklet has multiple pages with size 32000, number of pages in > blocklet is configurable. Since we keep the page with in short limit so no > need compress the inverted index here. > And maintain the max/min for each page to further prune the filter queries. > Read the blocklet with pages at once and keep in offheap memory. > During filter first check the max/min range and if it is valid then go for > decompressing the page to filter further. > > Please find the attached V3 format thrift file. > > -- > Thanks & Regards, > Ravi > -- Thanks & Regards, Ravi
