[
https://issues.apache.org/jira/browse/KYLIN-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16492269#comment-16492269
]
Zhong Yanghong commented on KYLIN-3388:
---------------------------------------
!Hive Issue - distribute by rand().png!
As the above figure shown, after the map step, data for reducers has been
prepared. Suppose R1 starts to run first. It will pull data D1,1 & D2,1 from
mappers. Then it finishes. Then R2 begins to run. Unluckily, this time M2 is
unavailable. Then R2 will ask to start another mapper called M'2. After M'2
prepared data D'2,1 & D'2,2, R2 pulls data D1,2 from M1, and pulls data D'2,2
from M'2. Finally R2 finishes its job.
Then the input for reducers will become D1,1 & D2,1, D1,2 & D2',2, rather than
D1,1 & D2,1, D1,2 & D2,2. Since the partitioner for this hive job is not fixed,
the data D2,2 & D'2,2 are rarely the same. Therefore, the final result will
become incorrect.
> Data may become not correct if mappers fail during the cube building step,
> "distribute by rand()"
> -------------------------------------------------------------------------------------------------
>
> Key: KYLIN-3388
> URL: https://issues.apache.org/jira/browse/KYLIN-3388
> Project: Kylin
> Issue Type: Bug
> Reporter: Zhong Yanghong
> Priority: Critical
> Attachments: Hive Issue - distribute by rand().png
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)