XIE FAN commented on KYLIN-2217:

Leaving UHC dictionary building job for the job engine to build is ok, but it 
may cause a single-point bottlenect. Actually, KYLIN-2217 is designed to remove 
this bottlenect. If we want to take advantage of both KYLIN-2217 and 
KYLIN-2135, there is another way: we can scan the Fact table twice and in the 
first scan we can know the distribution of data in UHC columns. So in the 
second scan we can split values to multi reducer and ensure  the order between 
reducers base on the result of the first scan. By using this way, the conflict 
can be fixed. But it may need to modify a lot.

> Reducers build dictionaries locally
> -----------------------------------
>                 Key: KYLIN-2217
>                 URL: https://issues.apache.org/jira/browse/KYLIN-2217
>             Project: Kylin
>          Issue Type: Improvement
>    Affects Versions: v1.5.4.1
>            Reporter: XIE FAN
>            Assignee: XIE FAN
>             Fix For: v2.0.0
>         Attachments: 0001-KYLIN-2217-Reducers-build-dictionaries-locally.patch
> In KYLIN-1851, we reduce the peek memory usage of the dictionary-building 
> procedure by splitting a single Trie tree structure to Trie forest. But there 
> still exist a bottleneck that all the dictionaries are built in Kylin client. 
> In this issue, we want to use multi reducers to build different dictionaries 
> locally and concurrently´╝îwhich can further reduce the peek memory usage as 
> well as speed up the dictionary-building procedure.

This message was sent by Atlassian JIRA

Reply via email to