[jira] [Resolved] (CARBONDATA-726) Update with V3 format for better IO and processing optimization.

Jacky Li (JIRA) Mon, 27 Feb 2017 16:30:07 -0800

     [ 
https://issues.apache.org/jira/browse/CARBONDATA-726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jacky Li resolved CARBONDATA-726.
---------------------------------
       Resolution: Fixed
    Fix Version/s: 1.1.0-incubating

> Update with V3 format for better IO and processing optimization.
> ----------------------------------------------------------------
>
>                 Key: CARBONDATA-726
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-726
>             Project: CarbonData
>          Issue Type: Improvement
>            Reporter: Ravindra Pesala
>             Fix For: 1.1.0-incubating
>
>          Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> Problems in current format.
> 1. IO read is slower since it needs to go for multiple seeks on the file to 
> read column blocklets. Current size of blocklet is 120000, so it needs to 
> read multiple times from file to scan the data on that column. Alternatively 
> we can increase the blocklet size but it suffers for filter queries as it 
> gets big blocklet to filter.
> 2. Decompression is slower in current format, we are using inverted index for 
> faster filter queries and using NumberCompressor to compress the inverted 
> index in bit wise packing. It becomes slower so we should avoid number 
> compressor. One alternative is to keep blocklet size with in 32000 so that 
> inverted index can be written with short, but IO read suffers a lot.
> To overcome from above 2 issues we are introducing new format V3.
> Here each blocklet has multiple pages with size 32000, number of pages in 
> blocklet is configurable. Since we keep the page with in short limit so no 
> need compress the inverted index here.
> And maintain the max/min for each page to further prune the filter queries.
> Read the blocklet with pages at once and keep in offheap memory.
> During filter first check the max/min range and if it is valid then go for 
> decompressing the page to filter further.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (CARBONDATA-726) Update with V3 format for better IO and processing optimization.

Reply via email to