[ 
https://issues.apache.org/jira/browse/KUDU-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15823465#comment-15823465
 ] 

Haijie Hong commented on KUDU-1291:
-----------------------------------

Hi, I'm confused about how to get the cardinality of a column? Should we set a 
threshold(such as cardinality / number of rows) to determine whether we should 
optimize it?

> Efficiently support predicates on non-prefix key components
> -----------------------------------------------------------
>
>                 Key: KUDU-1291
>                 URL: https://issues.apache.org/jira/browse/KUDU-1291
>             Project: Kudu
>          Issue Type: Sub-task
>          Components: perf, tablet
>            Reporter: Todd Lipcon
>
> In a lot of workloads, users have a compound primary key where the first 
> component (or few components) is low cardinality. For example, a time series 
> workload may have (year, month, day, entity_id, timestamp) as a primary key. 
> A metrics or log storage workload might have (hostname, timestamp).
> It's common to want to do cross-user or cross-date analytics like 'WHERE 
> timestamp BETWEEN <a> and <b>' without specifying any predicate for the first 
> column(s) of the PK. Currently, we do not execute this efficiently, but 
> rather scan the whole table evaluating the predicate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to