[ 
https://issues.apache.org/jira/browse/TEZ-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370823#comment-14370823
 ] 

Siddharth Seth commented on TEZ-2209:
-------------------------------------

Minor stuff.
- shuffleInfoEventsMap in ShuffleManager should be a ConcurrentMap - it can be 
accessed from multiple threads, and outside of any synchronization. Not 
required in ShuffleSchdeuler though since that's synchronized. Missed this in 
the earlier review for pipelined shuffle.
- In the reportFatalError invocation - it'll be useful to add the currently 
registered attemptNumber, and the one which caused the error.

The rest looks good to me.

> Fix pipelined shuffle to fetch data from any one attempt
> --------------------------------------------------------
>
>                 Key: TEZ-2209
>                 URL: https://issues.apache.org/jira/browse/TEZ-2209
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>         Attachments: TEZ-2209.1.patch, TEZ-2209.2.patch, TEZ-2209.3.patch
>
>
> - Currently, pipelined shuffle will fail-fast the moment it receives data 
> from an attempt other than 0.  This was done as an add-on check to prevent 
> data being copied from speculated attempts.
> - However, in some scenarios (like LLAP), it could be possible that that task 
> attempt gets killed even before generating any data.  In such cases, attempt 
> #1 or later attempts, would generate the actual data.
> - This jira is created to allow pipelined shuffle to download data from any 
> one attempt. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to