Shaofeng SHI created KYLIN-702:
----------------------------------

             Summary: When Kylin create the flat hive table, it generates large 
number of small files in HDFS 
                 Key: KYLIN-702
                 URL: https://issues.apache.org/jira/browse/KYLIN-702
             Project: Kylin
          Issue Type: Improvement
          Components: General
    Affects Versions: v0.7.1
            Reporter: Shaofeng SHI


When I build a cube, I noticed that when build the dictionary and calculate the 
cube, there are a large number of mappers be started (more than 10,000); With 
the log I noticed many mappers has 0 or much less records to process, this 
confused me; 

Then I checked the storage location of the flat table, found there are many 
files; I did a count and found it is the same number as the mappers; 

Too many mappers will cause much overhead, and download the cluster's 
performance; Kylin should ask Hive to merge those small files during creating 
the flat table step. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to