[
https://issues.apache.org/jira/browse/MAPREDUCE-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13105727#comment-13105727
]
Eli Collins commented on MAPREDUCE-3011:
----------------------------------------
@Todd - Yes, to re-trigger you need to restart the TT. This is how the code
currently works - once a directory is removed from LocalStorage's "good list"
it is never put back while the TT is running, ie once a dir is identified as
bad it won't be used by the TT. LocalDirAllocator#confChanged tries to notice
when a new dir is added to the conf but we don't add new MR local dirs at
runtime so this feature isn't used. Per HADOOP-7551 LocalDirAllocator (common)
and LocalStorage (mr) are currently independent but should be aware of each
other.
@Ravi LocalDirAllocator already keeps track of the valid dirs itself. Once
there is a bad dir LocalDirAllocator#confChanged executes for every call to get
a local directory, it's this code that calls checkDirs on each local directory.
It turns out the version of checkDirs that doesn't take a permissions parameter
is not as expensive as I thought (the method that takes a permission forks a
call to ls for each directory which is expensive). However confChanged creates
a new DF object for each local dir which has the side effect of resetting the
df interval which means forking a call to df instead of caching the last result
when LocalDirAllocator uses each DF.
In short, I think it's expensive if the configured dirs are different from the
list of valid dirs maintained by LocalDirAllocator. If we remove bad dirs from
the conf in the TT then they won't differ. Alternatively, we could modify
LocalDirAllocator to ignore bad directories but that would conflict with its
current design that explicitly tries to notice a difference between the set of
valid and configured dirs.
> TT should remove bad local dirs from conf to prevent constant disk checking
> ---------------------------------------------------------------------------
>
> Key: MAPREDUCE-3011
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3011
> Project: Hadoop Map/Reduce
> Issue Type: Sub-task
> Components: tasktracker
> Affects Versions: 0.20.204.0
> Reporter: Eli Collins
> Fix For: 0.20.205.0
>
>
> Per HADOOP-7551 the TT does not remove bad mapred.local.dirs from the conf so
> after a single disk failure *every* call to get a local path for reading or
> writing results in a disk check of *all* configured local dirs. After
> detecting that a local dir is bad we should remove it from the conf so that
> we don't repeatedly perform this expensive operation.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira