steveloughran commented on PR #5412:
URL: https://github.com/apache/hadoop/pull/5412#issuecomment-1439803446

   Currently `LocalDirAllocator.getLocalPathForWrite()` will create all target 
dirs if absent on first invocation or if the config is changed via 
`confChanged()`
   
   On later calls, checkWrite is called after a dir has been selected. This 
method contains both parent dir recreation *and* explicit attempt to create a 
file under it.
   
   But the parent dir creation is too late for the disk space allocator, which 
will have already got 0 as the available space if a dir doesn't exist.
   
   What does this mean? It means that long-lived LocalDirAllocator instances 
cannot recover from the deletion of temp directories, even though checkWrite is 
meant to be able to do so. Calling mkdirs() earlier *and ignoring the result* 
means that if hadoop.tmp.dir has been deleted there will be a best effort 
attempt to recreate it before looking for free disk space. By not checking at 
the return code we are avoiding changing where things may fail and all the 
complications that might bring, so it is not a perfect fix. It's just a lot 
simpler than something more sophisticated like handling the special case where 
there are no free directories buy triggering a whole new confChanged()-style 
rescan in the hope that will bring back a directory or even disk volume.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to