[
https://issues.apache.org/jira/browse/KYLIN-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shaofeng SHI updated KYLIN-980:
-------------------------------
Fix Version/s: (was: 2.0)
v2.1
Just noticed the change wasn't included in v2.0. Set the fixVerson to v2.1
> FactDistinctColumnsJob to support high cardinality columns
> ----------------------------------------------------------
>
> Key: KYLIN-980
> URL: https://issues.apache.org/jira/browse/KYLIN-980
> Project: Kylin
> Issue Type: Improvement
> Components: Job Engine
> Affects Versions: v0.7.2
> Reporter: Shaofeng SHI
> Assignee: Shaofeng SHI
> Labels: newbie
> Fix For: v2.1
>
>
> In FactDistinctColumnsJob's combiner and reducer, it uses a HashSet to remove
> the duplicated values; But if a column's cardinality is very big, say > 10
> Million, it may reports OutOfMemory error;
> It should be enhanced to support such case.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)