[
https://issues.apache.org/jira/browse/KYLIN-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15320080#comment-15320080
]
Shaofeng SHI commented on KYLIN-1677:
-------------------------------------
Dayue, thanks a lot for doing this report! When fact table is a view,
KYLIN-1656 is better, otherwise 1677's is better; Actually it is hard for me to
make a choice; The good thing is we can combine the two ways together (by
detecting the fact table type), if KYLIN-1656 is unacceptable in some extreme
case.
BTW, with KYLIN-1656, there is a count step between create the flat table and
restribute step, is the time included in the above report? Thanks!
> Distribute source data by certain columns when creating flat table
> ------------------------------------------------------------------
>
> Key: KYLIN-1677
> URL: https://issues.apache.org/jira/browse/KYLIN-1677
> Project: Kylin
> Issue Type: Improvement
> Components: Job Engine
> Reporter: Shaofeng SHI
> Assignee: Shaofeng SHI
> Fix For: v1.5.3
>
>
> Inspired by KYLIN-1656, Kylin can distribute the source data by certain
> columns when creating the flat hive table; Then the data assigned to a mapper
> will have more similarity, more aggregation can happen at mapper side, and
> then less shuffle and reduce is needed.
> Columns can be used for the distribution includes: ultra high cardinality
> column, mandantory column, partition date/time column, etc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)