[jira] [Commented] (KYLIN-1677) Distribute source data by certain columns when creating flat table

Shaofeng SHI (JIRA) Tue, 07 Jun 2016 23:19:48 -0700

    [ 
https://issues.apache.org/jira/browse/KYLIN-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15320080#comment-15320080
 ]


Shaofeng SHI commented on KYLIN-1677:
-------------------------------------

Dayue, thanks a lot for doing this report! When fact table is a view, 
KYLIN-1656 is better, otherwise 1677's is better; Actually it is hard for me to 
make a choice; The good thing is  we can combine the two ways together (by 
detecting the fact table type),  if KYLIN-1656 is unacceptable in some extreme 
case.

BTW, with KYLIN-1656, there is a count step between create the flat table and 
restribute step, is the time included in the above report? Thanks!

> Distribute source data by certain columns when creating flat table
> ------------------------------------------------------------------
>
>                 Key: KYLIN-1677
>                 URL: https://issues.apache.org/jira/browse/KYLIN-1677
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Job Engine
>            Reporter: Shaofeng SHI
>            Assignee: Shaofeng SHI
>             Fix For: v1.5.3
>
>
> Inspired by KYLIN-1656, Kylin can distribute the source data by certain 
> columns when creating the flat hive table; Then the data assigned to a mapper 
> will have more similarity, more aggregation can happen at mapper side, and 
> then less shuffle and reduce is needed.
> Columns can be used for the distribution includes: ultra high cardinality 
> column, mandantory column, partition date/time column, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KYLIN-1677) Distribute source data by certain columns when creating flat table

Reply via email to