[
https://issues.apache.org/jira/browse/KYLIN-3487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhong Yanghong updated KYLIN-3487:
----------------------------------
Description:
In eBay, there'll be around 20M sessions each day. And there's a requirement to
calculate the count distinct of sessions
For deep dive, users want to get the session cardinality in a year, or even
several years. If just for one year, the total cardinality will be around
20M*360 = 7B > 2B. It will exceed the the upper limitation of bitmap, and will
not good for
To calculate the count distinct of session, if a session never crosses days,
it's meaningless to merge the related counter, bitmap or hll, across days.
For count distinct session, it's meaningless to merge across days, for session
is never across days. Therefore, we may need a new measure containing a map,
using the date info as the key, and using bitmap or hll as the value. When
calculating count distinct, it's only need to get the state for each key-value
entry and then to summarize the states. And we don't need merge bitmap or hll
across different key-value entries.
was:
In eBay, there's a requirement to calculate the count distinct of sessions.
Each day there'll be 20M sessions. For deep dive, users want to get the session
cardinality in a month, or even several months. If just for one month, the
total cardinality will be around 20M*30
To calculate the count distinct of session, if a session never crosses days,
it's meaningless to merge the related counter, bitmap or hll, across days.
For count distinct session, it's meaningless to merge across days, for session
is never across days. Therefore, we may need a new measure containing a map,
using the date info as the key, and using bitmap or hll as the value. When
calculating count distinct, it's only need to get the state for each key-value
entry and then to summarize the states. And we don't need merge bitmap or hll
across different key-value entries.
> Create a new measure for count distinct
> ---------------------------------------
>
> Key: KYLIN-3487
> URL: https://issues.apache.org/jira/browse/KYLIN-3487
> Project: Kylin
> Issue Type: Improvement
> Reporter: Zhong Yanghong
> Assignee: Zhong Yanghong
> Priority: Major
>
> In eBay, there'll be around 20M sessions each day. And there's a requirement
> to calculate the count distinct of sessions
> For deep dive, users want to get the session cardinality in a year, or even
> several years. If just for one year, the total cardinality will be around
> 20M*360 = 7B > 2B. It will exceed the the upper limitation of bitmap, and
> will not good for
> To calculate the count distinct of session, if a session never crosses days,
> it's meaningless to merge the related counter, bitmap or hll, across days.
> For count distinct session, it's meaningless to merge across days, for
> session is never across days. Therefore, we may need a new measure containing
> a map, using the date info as the key, and using bitmap or hll as the value.
> When calculating count distinct, it's only need to get the state for each
> key-value entry and then to summarize the states. And we don't need merge
> bitmap or hll across different key-value entries.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)