Like filters optimization for dictionary encoded columns

Jihong MA (JIRA) Thu, 15 Dec 2016 10:59:37 -0800

     [ 
https://issues.apache.org/jira/browse/CARBONDATA-527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jihong MA updated CARBONDATA-527:
---------------------------------
    Issue Type: New Feature  (was: Improvement)
       Summary: Greater than/less-than/Like filters optimization for dictionary 
encoded columns  (was: Greater than/less-than/Like filters optimization for 
dictionary columns)

> Greater than/less-than/Like filters optimization for dictionary encoded 
> columns
> -------------------------------------------------------------------------------
>
>                 Key: CARBONDATA-527
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-527
>             Project: CarbonData
>          Issue Type: New Feature
>            Reporter: Sujith
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Current design 
> In greater than/less-than/Like filters, system first iterates each row 
> present in the dictionary cache for identifying valid filter actual members  
> by applying the filter expression , once evaluation done system will hold the 
> list of identified valid filter actual member values(String), now in next 
> step again  system will look up the dictionary cache in order to identify the 
> dictionary surrogate values of the identified members. this look up is an 
> additional cost to our system even though the look up methodology is an 
> binary search in dictionary cache.
>  
> Proposed design/solution:
> Identify the dictionary surrogate values in filter expression evaluation step 
> itself  when actual dictionary values will be scanned for identifying valid 
> filter members .
> Keep a dictionary counter variable which will be increased  when system 
> iterates through  the dictionary cache in order to retrieve each actual 
> member stored in dictionary cache , after this system will evaluate each row 
> against the filter expression to identify whether its a valid filter member 
> or not, while doing this process itself counter value can be taken as valid 
> selected dictionary value since the actual member values and its  dictionary 
> values will be kept in same order in dictionary cache as the iteration order.
> thus it will eliminate the further dictionary look up step which is required  
> to retrieve the dictionary surrogate value against identified actual valid 
> filter member. this can also increase significantly the filter query 
> performance of such filter queries which require expression evaluation to 
> identify it the filter members by looking up dictionary cache, like greater 
> than/less-than/Like filters .
> Note : this optimization is applicable for dictionary columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CARBONDATA-527) Greater than/less-than/Like filters optimization for dictionary encoded columns

Reply via email to