[
https://issues.apache.org/jira/browse/KYLIN-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shaofeng SHI updated KYLIN-702:
-------------------------------
Description:
When I build a cube, I noticed that when build the dictionary and calculate the
cube, there are a large number of mappers be started (more than 10,000); With
the log I noticed many mappers has 0 or much less records to process, this
confused me;
Then I checked the storage location of the flat table, found there are many
files; I did a count and found it is the same number as the mappers;
Too many mappers will cause much overhead, and downgrade the cluster's
performance; Kylin should ask Hive to merge those small files during creating
the flat table step.
In my hadoop cluster, the hive.merge.mapredfiles was set to false (default
value); After changing it to true for Kylin's job, the intermediate table's
file number was reduced to 4, each be up to 256M, looks good; Check hive
configuration at:
https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration
was:
When I build a cube, I noticed that when build the dictionary and calculate the
cube, there are a large number of mappers be started (more than 10,000); With
the log I noticed many mappers has 0 or much less records to process, this
confused me;
Then I checked the storage location of the flat table, found there are many
files; I did a count and found it is the same number as the mappers;
Too many mappers will cause much overhead, and download the cluster's
performance; Kylin should ask Hive to merge those small files during creating
the flat table step.
> When Kylin create the flat hive table, it generates large number of small
> files in HDFS
> ----------------------------------------------------------------------------------------
>
> Key: KYLIN-702
> URL: https://issues.apache.org/jira/browse/KYLIN-702
> Project: Kylin
> Issue Type: Improvement
> Components: General
> Affects Versions: v0.7.1
> Reporter: Shaofeng SHI
>
> When I build a cube, I noticed that when build the dictionary and calculate
> the cube, there are a large number of mappers be started (more than 10,000);
> With the log I noticed many mappers has 0 or much less records to process,
> this confused me;
> Then I checked the storage location of the flat table, found there are many
> files; I did a count and found it is the same number as the mappers;
> Too many mappers will cause much overhead, and downgrade the cluster's
> performance; Kylin should ask Hive to merge those small files during creating
> the flat table step.
> In my hadoop cluster, the hive.merge.mapredfiles was set to false (default
> value); After changing it to true for Kylin's job, the intermediate table's
> file number was reduced to 4, each be up to 256M, looks good; Check hive
> configuration at:
> https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)