[
https://issues.apache.org/jira/browse/TEZ-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15440192#comment-15440192
]
Jason Lowe commented on TEZ-2618:
---------------------------------
I think this can happen during a disk full scenario. Full disks are removed
from the list of local dirs when a container launches after the disk is
detected as full. Consider the following scenario:
# Container runs and creates some intermediate data on a local disk
# Disk later fills up and is detected by the nodemanager, but there are other
local disks still usable
# Subsequent container launches with a list of local dirs that does _not_
contain the disk where the intermediate data is located
# Local fetch cannot find the intermediate data in the list of local dirs
# An HTTP fetch _can_ find the data because the shuffle handler considers all
disks, full or not
> In Ordered Fetcher, if Local Fetch fails, fallback and try http Fetch before
> returning a failure
> ------------------------------------------------------------------------------------------------
>
> Key: TEZ-2618
> URL: https://issues.apache.org/jira/browse/TEZ-2618
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Saikat
> Assignee: Saikat
> Attachments: TEZ-2618.1.patch, TEZ-2618.patch
>
>
> In setupLocalDiskFetch() method[this is invoked when the fetcher is in the
> same host as the target map host], first try to check if we can open the
> target spill file using the localDirAllocator.getLocalPathToRead(). The
> localDirAllocator searches through the list of configured dirs for the file.
> In disk full scenarios, if the path is not found, fetcher should to try an
> http fetch.
> proposed solution:
> in local fetch mode, the fetcher should first try getLocalPathToRead() for
> all the pending maps. and So local fetch gets divided into 2 stages: first
> the maps for which path was found via LocalDirAllocator and second construct
> a http fallback fetch list for the maps which couldnt be found via
> LocalDirAllocator.getLocalPathToRead() and do an http fetch.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)