[ 
https://issues.apache.org/jira/browse/KYLIN-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17154356#comment-17154356
 ] 

ASF subversion and git services commented on KYLIN-4342:
--------------------------------------------------------

Commit f9ef8c699920b0d98fc2ad7a310a3b44738c883f in kylin's branch 
refs/heads/master from Zhong, Yanghong
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=f9ef8c6 ]

KYLIN-4342 Fix incorrect database


> Build Global Dict by MR/Hive New Version
> ----------------------------------------
>
>                 Key: KYLIN-4342
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4342
>             Project: Kylin
>          Issue Type: Improvement
>            Reporter: wangxiaojing
>            Assignee: wangxiaojing
>            Priority: Major
>             Fix For: v3.1.0
>
>
> At present, there are two limitations and some distributed concurrency lock 
> bugs in the implementation of global dictionary through MR/Hive:
> 1. Limited by Hive order by global sorting on the shuffle stage, the memory 
> and build time becomes uncontrollable with data volume reaching billion 
> level. We have tested the base of 800 million level to configure 15g memory, 
> and the build time of build dictionary needs more than 10 hours;
> 2. Multi global dictionary columns is calculated serially.
> 3. Some distributed concurrency lock bugs.
> We have improved the original version.The general idea of the new version is 
> the same as the previous Mr / Hive implementation, that is, to complete 
> global dictionary coding through Hive or MR, and then replace the original 
> value in the flat table with the dictionary encoded value.[Mr /Hive 
> V1|[http://kylin.apache.org/docs30/howto/howto_use_hive_mr_dict.html]]
>  However, in the new version, will add "parallel part build" and "parallel 
> total build" two steps by mr to replace the original "build dict" step, so as 
> to solve the above two limitations.And use ZK to solve the distributed 
> concurrency lock bugs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to