[ 
https://issues.apache.org/jira/browse/KYLIN-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17120189#comment-17120189
 ] 

ASF GitHub Bot commented on KYLIN-4342:
---------------------------------------

hit-lacus commented on pull request #1207:
URL: https://github.com/apache/kylin/pull/1207#issuecomment-636310944


   Thank you @wangxiaojing123 , let's merge it into master branch.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build Global Dict by MR/Hive New Version
> ----------------------------------------
>
>                 Key: KYLIN-4342
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4342
>             Project: Kylin
>          Issue Type: Improvement
>            Reporter: wangxiaojing
>            Assignee: wangxiaojing
>            Priority: Major
>             Fix For: v3.1.0
>
>
> At present, there are two limitations and some distributed concurrency lock 
> bugs in the implementation of global dictionary through MR/Hive:
> 1. Limited by Hive order by global sorting on the shuffle stage, the memory 
> and build time becomes uncontrollable with data volume reaching billion 
> level. We have tested the base of 800 million level to configure 15g memory, 
> and the build time of build dictionary needs more than 10 hours;
> 2. Multi global dictionary columns is calculated serially.
> 3. Some distributed concurrency lock bugs.
> We have improved the original version.The general idea of the new version is 
> the same as the previous Mr / Hive implementation, that is, to complete 
> global dictionary coding through Hive or MR, and then replace the original 
> value in the flat table with the dictionary encoded value.[Mr /Hive 
> V1|[http://kylin.apache.org/docs30/howto/howto_use_hive_mr_dict.html]]
>  However, in the new version, will add "parallel part build" and "parallel 
> total build" two steps by mr to replace the original "build dict" step, so as 
> to solve the above two limitations.And use ZK to solve the distributed 
> concurrency lock bugs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to