[
https://issues.apache.org/jira/browse/TEZ-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15529928#comment-15529928
]
Kuhu Shukla commented on TEZ-3361:
----------------------------------
[~jeagles], Thank you for the latest patch!
{code}
UnorderedKVInput.java:
boolean compositeFetch = ShuffleUtils.isTezShuffleHandler(conf);
{code}
Config key for composite fetch should be part of the static confKeys Set.
{code}
Fetcher.java
/*
if (shouldRetry(srcAttemptId, ioe)) {
//release mem/file handles
cleanupFetchedInput(fetchedInput);
throw new FetcherReadTimeoutException(ioe);
}
{code}
It would be nice if fetchInputs() error handling could retry all if we fail to
fetch even one of the inputs. Similar addition for FetcherOrderedGrouped.
As we discussed offline :
{code}
if (header.getCompressedLength() == 0) {
// Empty partitions are already accounted for
continue;
}
{code}
Similar change is needed in OrderedGrouped case when partLength is zero.
One other follow up would be to remove empty partitions altogether from the
shuffle header.
> Fetch Multiple Partitions from the Shuffle Handler
> --------------------------------------------------
>
> Key: TEZ-3361
> URL: https://issues.apache.org/jira/browse/TEZ-3361
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Jonathan Eagles
> Assignee: Jonathan Eagles
> Attachments: TEZ-3361.1.patch, TEZ-3361.2.patch, TEZ-3361.3.patch
>
>
> Provide an API that allows for fetching multiple partitions at once from a
> single upstream task. This is to better support auto-reduce parallelism where
> a single downstream task is impersonating several (possibly?) consecutive
> downstream tasks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)