[
https://issues.apache.org/jira/browse/TEZ-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rajesh Balamohan updated TEZ-2001:
----------------------------------
Attachment: TEZ-2001.6.patch
Changing remaining to a List from a Set in the Fetcher leads to some
inefficiency - since the size of this list can be ~30, and remove() calls can
be expensive. We may want to fix this later - by using the spillId in the
hashCode - or a wrapping structure for just this.
- SpillId can not be added to the hashCode as it would break ShuffleScheduler
shuffleInfoEventsMap. Might have to consider using Map with an identifier. Will
create a separate JIRA for this.
..PathComponent - so will work. We probably should have the fetchers use a
method from TezTaskOutputFiles to be more consistent....
- DiskMerger code works fine with this change. Added TODO for making use of
TezTaskOutputFiles in FetcherOrderedGrouped. Will add a method in
TezTaskOutputFiles in follow up JIRA.
Minor: "Speculative execution needs to be turned when using this parameter" -
"off" missing
- Fixed. Missed it for the pipelined shuffle case.
ShuffleScheduler - dedupedList.put(inputNumber, id); - Is it possible for id to
have an older revision compared to what's in the oldList ? I think that check
should be in place.
- old revision is removed from oldIdList. Plz let me know if i am missing
anything here.
DefaultSorter - "if (spillRecord == null) { ... else writeIndexFile" - By this
point, indexCacheList will always be populated, which means we could end up
over-writing previously written spill index files.
- Fixed. Checking if spillFileIndexPaths already has the path details. If so,
it is not necessary to write the index files again.
Marked TEZ-2132 as the uber ticket for fault tolerance & speculation.
> Support pipelined data transfer for ordered output
> --------------------------------------------------
>
> Key: TEZ-2001
> URL: https://issues.apache.org/jira/browse/TEZ-2001
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Rajesh Balamohan
> Assignee: Rajesh Balamohan
> Attachments: TEZ-2001.1.patch, TEZ-2001.2.patch, TEZ-2001.3.patch,
> TEZ-2001.4.patch, TEZ-2001.5.patch, TEZ-2001.6.patch, benchmark_q17_10TB.png,
> dag_plan.jpg
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)