[
https://issues.apache.org/jira/browse/CARBONDATA-527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jihong MA updated CARBONDATA-527:
---------------------------------
Issue Type: New Feature (was: Improvement)
Summary: Greater than/less-than/Like filters optimization for dictionary
encoded columns (was: Greater than/less-than/Like filters optimization for
dictionary columns)
> Greater than/less-than/Like filters optimization for dictionary encoded
> columns
> -------------------------------------------------------------------------------
>
> Key: CARBONDATA-527
> URL: https://issues.apache.org/jira/browse/CARBONDATA-527
> Project: CarbonData
> Issue Type: New Feature
> Reporter: Sujith
> Time Spent: 40m
> Remaining Estimate: 0h
>
> Current design
> In greater than/less-than/Like filters, system first iterates each row
> present in the dictionary cache for identifying valid filter actual members
> by applying the filter expression , once evaluation done system will hold the
> list of identified valid filter actual member values(String), now in next
> step again system will look up the dictionary cache in order to identify the
> dictionary surrogate values of the identified members. this look up is an
> additional cost to our system even though the look up methodology is an
> binary search in dictionary cache.
>
> Proposed design/solution:
> Identify the dictionary surrogate values in filter expression evaluation step
> itself when actual dictionary values will be scanned for identifying valid
> filter members .
> Keep a dictionary counter variable which will be increased when system
> iterates through the dictionary cache in order to retrieve each actual
> member stored in dictionary cache , after this system will evaluate each row
> against the filter expression to identify whether its a valid filter member
> or not, while doing this process itself counter value can be taken as valid
> selected dictionary value since the actual member values and its dictionary
> values will be kept in same order in dictionary cache as the iteration order.
> thus it will eliminate the further dictionary look up step which is required
> to retrieve the dictionary surrogate value against identified actual valid
> filter member. this can also increase significantly the filter query
> performance of such filter queries which require expression evaluation to
> identify it the filter members by looking up dictionary cache, like greater
> than/less-than/Like filters .
> Note : this optimization is applicable for dictionary columns.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)