[
https://issues.apache.org/jira/browse/KYLIN-2866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16307624#comment-16307624
]
Billy Liu commented on KYLIN-2866:
----------------------------------
Thanks [~yaho] & [~lidong_sjtu]
To enable this feature, set
kylin.engine.mr.hll-max-reduce-number = 50 (default 1, which multi-reducer hll
calculation is disabled)
kylin.engine.mr.per-reducer-hll-cuboid-number=100 (default value)
> Enlarge the reducer number for hyperloglog statistics calculation at step
> FactDistinctColumnsJob
> ------------------------------------------------------------------------------------------------
>
> Key: KYLIN-2866
> URL: https://issues.apache.org/jira/browse/KYLIN-2866
> Project: Kylin
> Issue Type: Improvement
> Components: Job Engine
> Reporter: Zhong Yanghong
> Assignee: Zhong Yanghong
> Fix For: v2.3.0
>
> Attachments: APACHE-KYLIN-2866-refined.patch, APACHE-KYLIN-2866.patch
>
>
> Currently only one reducer is assigned for hll stats calculation, which may
> become the bottleneck for slow down this step. Since the stats for different
> cuboids will not influence each other, it's better to divide the cuboid set
> into several and assign a reduce for each subset.
> The strategy of this patch is to assign 100 cuboids into a subset. And
> there's a upper limit of reducers for hll stats calculation. Currently it's
> 50.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)