[ https://issues.apache.org/jira/browse/KYLIN-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784573#comment-15784573 ]
Shaofeng SHI commented on KYLIN-2328: ------------------------------------- +1 good points; when there are lots of segment, each time submitting all segments' dict repeatedly is expensive, this patch optimized this very well. > Reduce the size of metadata uploaded to distributed cache > --------------------------------------------------------- > > Key: KYLIN-2328 > URL: https://issues.apache.org/jira/browse/KYLIN-2328 > Project: Kylin > Issue Type: Improvement > Components: Job Engine > Affects Versions: all > Reporter: Dayue Gao > Assignee: Dayue Gao > Fix For: v2.0.0 > > Attachments: KYLIN-2328.patch > > > Currently, each MR job uploads all the metadata belonging to a cube to > distributed cache. When the total size of metadata increases, the submission > time ("MapReduce Waiting" at Monitor UI) also increases and could become a > significant problem. > We could actually optimize the amount of metadata uploaded according to the > type of job, for example > * CuboidJob only needs dictionary of the building segment > * CubeHFileJob doesn't need any dictionary -- This message was sent by Atlassian JIRA (v6.3.4#6332)