[ 
https://issues.apache.org/jira/browse/TEZ-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14296580#comment-14296580
 ] 

Rajesh Balamohan commented on TEZ-2001:
---------------------------------------

As and when incremental DME (per spill file) arrives, fetchers are allowed to 
download the data (e.g assume sorter is going to churn out 4 spills segments in 
PipelinedSorter.  When a segment is spilled, DME event is sent out and fetcher 
starts downloading this).   The last DME can also processed in parallel in 
consumer side.  However, consumer ensures that all previous spills pertaining 
to the attempt are downloaded before declaring success (i.e 4 DME events should 
have been processed to declare that consumer has downloaded data from the 
attempt).  This will help in terms of downloading the data in parallel as the 
data is getting generated in the source.

Merging happens in parallel (based on resource in memory or disk). When partial 
data is downloaded, there is a potential chance that this data is merged and 
the source task dies in middle.  In subsequent jiras, we need to refactor 
InMemory and Disk merges not to consider the partially downloaded data and it 
should consider the attempts for which all data has been downloaded.

> Support pipelined data transfer for ordered output
> --------------------------------------------------
>
>                 Key: TEZ-2001
>                 URL: https://issues.apache.org/jira/browse/TEZ-2001
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>         Attachments: TEZ-2001.1.patch, TEZ-2001.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to