[
https://issues.apache.org/jira/browse/KYLIN-3487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhong Yanghong updated KYLIN-3487:
----------------------------------
Summary: Create a new measure for precise count distinct (was: Create a
new measure for count distinct)
> Create a new measure for precise count distinct
> -----------------------------------------------
>
> Key: KYLIN-3487
> URL: https://issues.apache.org/jira/browse/KYLIN-3487
> Project: Kylin
> Issue Type: Improvement
> Reporter: Zhong Yanghong
> Assignee: Zhong Yanghong
> Priority: Major
>
> In eBay, there'll be around 20M sessions each day. And there's a requirement
> to calculate the count distinct of sessions
> For deep dive, users want to get the session cardinality in a year, or even
> several years. If just for one year, the total cardinality will be around
> 20M*360 = 7B > 2B. It will exceed the the upper limitation of bitmap, and
> will not good for
> To calculate the count distinct of session, if a session never crosses days,
> it's meaningless to merge the related counter, bitmap or hll, across days.
> For count distinct session, it's meaningless to merge across days, for
> session is never across days. Therefore, we may need a new measure containing
> a map, using the date info as the key, and using bitmap or hll as the value.
> When calculating count distinct, it's only need to get the state for each
> key-value entry and then to summarize the states. And we don't need merge
> bitmap or hll across different key-value entries.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)