[ 
https://issues.apache.org/jira/browse/KYLIN-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangxiaojing updated KYLIN-4342:
--------------------------------
    Description: 
At present, there are two limitations and some distributed concurrency lock 
bugs in the implementation of global dictionary through MR/Hive:
1. Limited by Hive order by global sorting on the shuffle stage, the memory and 
build time becomes uncontrollable with data volume reaching billion level. We 
have tested the base of 800 million level to configure 15g memory, and the 
build time of build dictionary needs more than 10 hours;
2. Multi global dictionary columns is calculated serially.
3. Some distributed concurrency lock bugs.

We have improved the original version.The general idea of the new version is 
the same as the previous Mr / Hive implementation, that is, to complete global 
dictionary coding through Hive or MR, and then replace the original value in 
the flat table with the dictionary encoded value.[Mr /Hive 
V1|[http://kylin.apache.org/docs30/howto/howto_use_hive_mr_dict.html]]
 However, in the new version, will add "parallel part build" and "parallel 
total build" two steps by mr to replace the original "build dict" step, so as 
to solve the above two limitations.And use ZK to solve the distributed 
concurrency lock bugs. 

  was:
At present, there are two limitations in the implementation of global 
dictionary through MR/Hive:
 1. Limited by Hive order by global sorting on the shuffle stage, the memory 
and build time becomes uncontrollable with data volume reaching billion level. 
We have tested the base of 800 million level to configure 15g memory, and the 
build time of build dictionary needs more than 10 hours;
 2. Multi global dictionary columns is calculated serially.

We have improved the original version.The general idea of the new version is 
the same as the previous Mr / Hive implementation, that is, to complete global 
dictionary coding through Hive or MR, and then replace the original value in 
the flat table with the dictionary encoded value.[Mr /Hive 
V1|[http://kylin.apache.org/docs30/howto/howto_use_hive_mr_dict.html]]
 However, in the new version, will add "parallel part build" and "parallel 
total build" two steps by mr to replace the original "build dict" step, so as 
to solve the above two limitations.And use ZK to solve the distributed 
concurrency lock bugs. 


> Build Global Dict by MR/Hive New Version
> ----------------------------------------
>
>                 Key: KYLIN-4342
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4342
>             Project: Kylin
>          Issue Type: Improvement
>    Affects Versions: Future
>            Reporter: wangxiaojing
>            Assignee: wangxiaojing
>            Priority: Major
>
> At present, there are two limitations and some distributed concurrency lock 
> bugs in the implementation of global dictionary through MR/Hive:
> 1. Limited by Hive order by global sorting on the shuffle stage, the memory 
> and build time becomes uncontrollable with data volume reaching billion 
> level. We have tested the base of 800 million level to configure 15g memory, 
> and the build time of build dictionary needs more than 10 hours;
> 2. Multi global dictionary columns is calculated serially.
> 3. Some distributed concurrency lock bugs.
> We have improved the original version.The general idea of the new version is 
> the same as the previous Mr / Hive implementation, that is, to complete 
> global dictionary coding through Hive or MR, and then replace the original 
> value in the flat table with the dictionary encoded value.[Mr /Hive 
> V1|[http://kylin.apache.org/docs30/howto/howto_use_hive_mr_dict.html]]
>  However, in the new version, will add "parallel part build" and "parallel 
> total build" two steps by mr to replace the original "build dict" step, so as 
> to solve the above two limitations.And use ZK to solve the distributed 
> concurrency lock bugs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to