[jira] [Comment Edited] (TEZ-1396) Grouping should generate consistent groups when given the same set of Splits

Bikas Saha (JIRA) Sat, 09 Aug 2014 12:41:24 -0700

    [ 
https://issues.apache.org/jira/browse/TEZ-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14091851#comment-14091851
 ]


Bikas Saha edited comment on TEZ-1396 at 8/9/14 7:40 PM:
---------------------------------------------------------

This is not something thats always desirable. In a busy cluster, when a data 
set is hot then there are equally good reasons to spread different consumers 
around to avoid hot spots. The intent of this jira mainly helps cases where 
there is some active service trying to cache data. So this behavior should not 
be the default but enabled when its known to be helpful.


was (Author: bikassaha):
This is not something thats always desirable. In a busy cluster, when a data 
set is hot then there are equally good reasons to spread different consumers 
around to avoid hot spots. The intent of this jira mainly helps cases where 
there is some active service trying to cache data.

> Grouping should generate consistent groups when given the same set of Splits
> ----------------------------------------------------------------------------
>
>                 Key: TEZ-1396
>                 URL: https://issues.apache.org/jira/browse/TEZ-1396
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>
> Currently, it seems like Grouping can end up generating a different set of 
> groups on different invocations of the same set of splits and target tasks.
> The order likely gets affected by the randomization in the block location 
> report from HDFS.
> This should be consistent for better cache utilization.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (TEZ-1396) Grouping should generate consistent groups when given the same set of Splits

Reply via email to