[
https://issues.apache.org/jira/browse/KUDU-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17561013#comment-17561013
]
ASF subversion and git services commented on KUDU-1291:
-------------------------------------------------------
Commit 936d7edc4e4b69d2e1f1dffc96760cb3fd57a934 in kudu's branch
refs/heads/master from zhangyifan27
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=936d7edc4 ]
KUDU-1644: Simplify InList predicate values based on rowset PK bounds
Previous we only optimize InList predicates based on tablet PK bounds, we can
also optimize it at the DRS level. By adding the implicit PK bounds, InList
predicate can be simplified. Also, the DRS bounds info can be used to skip rows
effectively when we have a predicate on a non-prefix of the primary key and the
leading column(s) have cardinality=1 (as described in KUDU-1291).
Benchmark tests result(in slow mode):
before
Selected 10000 rows cost 2.519996 seconds. # PredicateOnFirstColumn
Selected 100 rows cost 2.040003 seconds. # PredicateOnSecondColumn
after
Selected 10000 rows cost 1.771755 seconds. # PredicateOnFirstColumn
Selected 100 rows cost 0.131996 seconds. # PredicateOnSecondColumn
Change-Id: Ia9c2aa958f19a0b62e40a2ef5eb5365f91cbab80
Reviewed-on: http://gerrit.cloudera.org:8080/18434
Tested-by: Kudu Jenkins
Reviewed-by: Yingchun Lai <[email protected]>
> Efficiently support predicates on non-prefix key components
> -----------------------------------------------------------
>
> Key: KUDU-1291
> URL: https://issues.apache.org/jira/browse/KUDU-1291
> Project: Kudu
> Issue Type: Sub-task
> Components: perf, tablet
> Reporter: Todd Lipcon
> Priority: Major
> Labels: performance, roadmap-candidate
>
> In a lot of workloads, users have a compound primary key where the first
> component (or few components) is low cardinality. For example, a time series
> workload may have (year, month, day, entity_id, timestamp) as a primary key.
> A metrics or log storage workload might have (hostname, timestamp).
> It's common to want to do cross-user or cross-date analytics like 'WHERE
> timestamp BETWEEN <a> and <b>' without specifying any predicate for the first
> column(s) of the PK. Currently, we do not execute this efficiently, but
> rather scan the whole table evaluating the predicate.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)