[jira] [Created] (KYLIN-1677) Distribute source data by certain columns when creating flat table

Shaofeng SHI (JIRA) Wed, 11 May 2016 00:07:40 -0700

Shaofeng SHI created KYLIN-1677:
-----------------------------------

             Summary: Distribute source data by certain columns when creating 
flat table
                 Key: KYLIN-1677
                 URL: https://issues.apache.org/jira/browse/KYLIN-1677
             Project: Kylin
          Issue Type: Improvement
          Components: Job Engine
            Reporter: Shaofeng SHI
            Assignee: Shaofeng SHI



Inspired by KYLIN-1656, Kylin can distribute the source data by certain columns 
when creating the flat hive table; Then the data assigned to a mapper will have 
more similarity, more aggregation can happen at mapper side, and then less 
shuffle and reduce is needed.

Columns can be used for the distribution includes: ultra high cardinality 
column, mandantory column, partition date/time column, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KYLIN-1677) Distribute source data by certain columns when creating flat table

Reply via email to