[ https://issues.apache.org/jira/browse/KYLIN-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17120199#comment-17120199 ]
ASF subversion and git services commented on KYLIN-4342: -------------------------------------------------------- Commit a2489aaf4560adf7f415629519d6e4b617967dce in kylin's branch refs/heads/master from wangxiaojing [ https://gitbox.apache.org/repos/asf?p=kylin.git;h=a2489aa ] KYLIN-4342 Build Global Dict by MR/Hive New Version, fix some potential bugs, such as null pointer exceptions > Build Global Dict by MR/Hive New Version > ---------------------------------------- > > Key: KYLIN-4342 > URL: https://issues.apache.org/jira/browse/KYLIN-4342 > Project: Kylin > Issue Type: Improvement > Reporter: wangxiaojing > Assignee: wangxiaojing > Priority: Major > Fix For: v3.1.0 > > > At present, there are two limitations and some distributed concurrency lock > bugs in the implementation of global dictionary through MR/Hive: > 1. Limited by Hive order by global sorting on the shuffle stage, the memory > and build time becomes uncontrollable with data volume reaching billion > level. We have tested the base of 800 million level to configure 15g memory, > and the build time of build dictionary needs more than 10 hours; > 2. Multi global dictionary columns is calculated serially. > 3. Some distributed concurrency lock bugs. > We have improved the original version.The general idea of the new version is > the same as the previous Mr / Hive implementation, that is, to complete > global dictionary coding through Hive or MR, and then replace the original > value in the flat table with the dictionary encoded value.[Mr /Hive > V1|[http://kylin.apache.org/docs30/howto/howto_use_hive_mr_dict.html]] > However, in the new version, will add "parallel part build" and "parallel > total build" two steps by mr to replace the original "build dict" step, so as > to solve the above two limitations.And use ZK to solve the distributed > concurrency lock bugs. -- This message was sent by Atlassian Jira (v8.3.4#803005)