[
https://issues.apache.org/jira/browse/KYLIN-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15118510#comment-15118510
]
Edward Zhang commented on KYLIN-980:
------------------------------------
Do we have a pull request for this?
> FactDistinctColumnsJob to support high cardinality columns
> ----------------------------------------------------------
>
> Key: KYLIN-980
> URL: https://issues.apache.org/jira/browse/KYLIN-980
> Project: Kylin
> Issue Type: Improvement
> Components: Job Engine
> Affects Versions: v0.7.2
> Reporter: Shaofeng SHI
> Assignee: Shaofeng SHI
> Labels: newbie
> Fix For: 2.0
>
>
> In FactDistinctColumnsJob's combiner and reducer, it uses a HashSet to remove
> the duplicated values; But if a column's cardinality is very big, say > 10
> Million, it may reports OutOfMemory error;
> It should be enhanced to support such case.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)