[
https://issues.apache.org/jira/browse/KYLIN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15821788#comment-15821788
]
XIE FAN commented on KYLIN-2217:
--------------------------------
Leaving UHC dictionary building job for the job engine to build is ok, but it
may cause a single-point bottlenect. Actually, KYLIN-2217 is designed to remove
this bottlenect. If we want to take advantage of both KYLIN-2217 and
KYLIN-2135, there is another way: we can scan the Fact table twice and in the
first scan we can know the distribution of data in UHC columns. So in the
second scan we can split values to multi reducer and ensure the order between
reducers base on the result of the first scan. By using this way, the conflict
can be fixed. But it may need to modify a lot.
> Reducers build dictionaries locally
> -----------------------------------
>
> Key: KYLIN-2217
> URL: https://issues.apache.org/jira/browse/KYLIN-2217
> Project: Kylin
> Issue Type: Improvement
> Affects Versions: v1.5.4.1
> Reporter: XIE FAN
> Assignee: XIE FAN
> Fix For: v2.0.0
>
> Attachments: 0001-KYLIN-2217-Reducers-build-dictionaries-locally.patch
>
>
> In KYLIN-1851, we reduce the peek memory usage of the dictionary-building
> procedure by splitting a single Trie tree structure to Trie forest. But there
> still exist a bottleneck that all the dictionaries are built in Kylin client.
> In this issue, we want to use multi reducers to build different dictionaries
> locally and concurrently,which can further reduce the peek memory usage as
> well as speed up the dictionary-building procedure.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)