[
https://issues.apache.org/jira/browse/TEZ-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14091851#comment-14091851
]
Bikas Saha edited comment on TEZ-1396 at 8/9/14 7:40 PM:
---------------------------------------------------------
This is not something thats always desirable. In a busy cluster, when a data
set is hot then there are equally good reasons to spread different consumers
around to avoid hot spots. The intent of this jira mainly helps cases where
there is some active service trying to cache data. So this behavior should not
be the default but enabled when its known to be helpful.
was (Author: bikassaha):
This is not something thats always desirable. In a busy cluster, when a data
set is hot then there are equally good reasons to spread different consumers
around to avoid hot spots. The intent of this jira mainly helps cases where
there is some active service trying to cache data.
> Grouping should generate consistent groups when given the same set of Splits
> ----------------------------------------------------------------------------
>
> Key: TEZ-1396
> URL: https://issues.apache.org/jira/browse/TEZ-1396
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Siddharth Seth
> Assignee: Siddharth Seth
>
> Currently, it seems like Grouping can end up generating a different set of
> groups on different invocations of the same set of splits and target tasks.
> The order likely gets affected by the randomization in the block location
> report from HDFS.
> This should be consistent for better cache utilization.
--
This message was sent by Atlassian JIRA
(v6.2#6252)