[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13105727#comment-13105727
 ] 

Eli Collins commented on MAPREDUCE-3011:
----------------------------------------

@Todd - Yes, to re-trigger you need to restart the TT. This is how the code 
currently works - once a directory is removed from LocalStorage's "good list" 
it is never put back while the TT is running, ie once a dir is identified as 
bad it won't be used by the TT.   LocalDirAllocator#confChanged tries to notice 
when a new dir is added to the conf but we don't add new MR local dirs at 
runtime so this feature isn't used. Per HADOOP-7551 LocalDirAllocator (common) 
and LocalStorage (mr) are currently independent but should be aware of each 
other.

@Ravi LocalDirAllocator already keeps track of the valid dirs itself. Once 
there is a bad dir LocalDirAllocator#confChanged executes for every call to get 
a local directory, it's this code that calls checkDirs on each local directory. 
It turns out the version of checkDirs that doesn't take a permissions parameter 
is not as expensive as I thought (the method that takes a permission forks a 
call to ls for each directory which is expensive). However confChanged creates 
a new DF object for each local dir which has the side effect of resetting the 
df interval which means forking a call to df instead of caching the last result 
when LocalDirAllocator uses each DF.

In short, I think it's expensive if the configured dirs are different from the 
list of valid dirs maintained by LocalDirAllocator. If we remove bad dirs from 
the conf in the TT then they won't differ. Alternatively, we could modify 
LocalDirAllocator to ignore bad directories but that would conflict with its 
current design that explicitly tries to notice a difference between the set of 
valid and configured dirs.

> TT should remove bad local dirs from conf to prevent constant disk checking
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3011
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3011
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: tasktracker
>    Affects Versions: 0.20.204.0
>            Reporter: Eli Collins
>             Fix For: 0.20.205.0
>
>
> Per HADOOP-7551 the TT does not remove bad mapred.local.dirs from the conf so 
> after a single disk failure *every* call to get a local path for reading or 
> writing results in a disk check of *all* configured local dirs. After 
> detecting that a local dir is bad we should remove it from the conf so that 
> we don't repeatedly perform this expensive operation.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to