[ 
https://issues.apache.org/jira/browse/KYLIN-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI updated KYLIN-2404:
--------------------------------
    Description: 
Since 1.5.3, Kylin uses a "redistribute" step to merge the small files to 
proper size after creating the intermediate hive table. While in some users' 
environment, hive merge small files is enabled by default, that will cause 
additional CPU and will impact on the cube building performance (in extreme 
case the files will be merged to 256MB, then only very small number of mappers 
be started in building).

So Kylin should explicitly tell Hive to disable the merge small files feature 
when  creating and redistributing the intermediate flat table. Will add 
"hive.merge.mapfiles" and "hive.merge.mapredfiles" to conf/kylin_hive_conf.xml 
with value "false". The meaning of these two parameters can be found in 
https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration

  was:
Since 1.5.3, Kylin uses a "redistribute" step to merge the small files to 
proper size after creating the intermediate hive table. While in some users' 
environment, hive merge small files is enabled by default, that will cause 
additional CPU and will impact on the cube building performance (in extreme 
case the files will be merged to 256MB, then only very small number of mappers 
be started in building).

So Kylin should explicitly tell Hive to disable the merge small files feature 
when  creating and redistributing the intermediate flat table. Will add 
"hive.merge.mapfiles" and "hive.merge.mapredfiles" to conf/kylin_hive_conf.xml 
with value "false".


> Add "hive.merge.mapfiles" and "hive.merge.mapredfiles" to kylin_hive_conf.xml
> -----------------------------------------------------------------------------
>
>                 Key: KYLIN-2404
>                 URL: https://issues.apache.org/jira/browse/KYLIN-2404
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Job Engine
>            Reporter: Shaofeng SHI
>            Assignee: Shaofeng SHI
>            Priority: Minor
>             Fix For: v2.0.0
>
>
> Since 1.5.3, Kylin uses a "redistribute" step to merge the small files to 
> proper size after creating the intermediate hive table. While in some users' 
> environment, hive merge small files is enabled by default, that will cause 
> additional CPU and will impact on the cube building performance (in extreme 
> case the files will be merged to 256MB, then only very small number of 
> mappers be started in building).
> So Kylin should explicitly tell Hive to disable the merge small files feature 
> when  creating and redistributing the intermediate flat table. Will add 
> "hive.merge.mapfiles" and "hive.merge.mapredfiles" to 
> conf/kylin_hive_conf.xml with value "false". The meaning of these two 
> parameters can be found in 
> https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to