[ 
https://issues.apache.org/jira/browse/TEZ-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-2001:
----------------------------------
    Attachment: TEZ-2001.6.patch


Changing remaining to a List from a Set in the Fetcher leads to some 
inefficiency - since the size of this list can be ~30, and remove() calls can 
be expensive. We may want to fix this later - by using the spillId in the 
hashCode - or a wrapping structure for just this.
- SpillId can not be added to the hashCode as it would break ShuffleScheduler 
shuffleInfoEventsMap. Might have to consider using Map with an identifier. Will 
create a separate JIRA for this.

..PathComponent - so will work. We probably should have the fetchers use a 
method from TezTaskOutputFiles to be more consistent....
 - DiskMerger code works fine with this change. Added TODO for making use of 
TezTaskOutputFiles in FetcherOrderedGrouped. Will add a method in 
TezTaskOutputFiles in follow up JIRA.

Minor: "Speculative execution needs to be turned when using this parameter" - 
"off" missing
- Fixed. Missed it for the pipelined shuffle case.

ShuffleScheduler - dedupedList.put(inputNumber, id); - Is it possible for id to 
have an older revision compared to what's in the oldList ? I think that check 
should be in place.
- old revision is removed from oldIdList. Plz let me know if i am missing 
anything here.

DefaultSorter - "if (spillRecord == null) { ... else writeIndexFile" - By this 
point, indexCacheList will always be populated, which means we could end up 
over-writing previously written spill index files.
- Fixed. Checking if spillFileIndexPaths already has the path details.  If so, 
it is not necessary to write the index files again.

 
Marked TEZ-2132 as the uber ticket for fault tolerance & speculation.

> Support pipelined data transfer for ordered output
> --------------------------------------------------
>
>                 Key: TEZ-2001
>                 URL: https://issues.apache.org/jira/browse/TEZ-2001
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>         Attachments: TEZ-2001.1.patch, TEZ-2001.2.patch, TEZ-2001.3.patch, 
> TEZ-2001.4.patch, TEZ-2001.5.patch, TEZ-2001.6.patch, benchmark_q17_10TB.png, 
> dag_plan.jpg
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to