steveloughran commented on PR #5412: URL: https://github.com/apache/hadoop/pull/5412#issuecomment-1439803446
Currently `LocalDirAllocator.getLocalPathForWrite()` will create all target dirs if absent on first invocation or if the config is changed via `confChanged()` On later calls, checkWrite is called after a dir has been selected. This method contains both parent dir recreation *and* explicit attempt to create a file under it. But the parent dir creation is too late for the disk space allocator, which will have already got 0 as the available space if a dir doesn't exist. What does this mean? It means that long-lived LocalDirAllocator instances cannot recover from the deletion of temp directories, even though checkWrite is meant to be able to do so. Calling mkdirs() earlier *and ignoring the result* means that if hadoop.tmp.dir has been deleted there will be a best effort attempt to recreate it before looking for free disk space. By not checking at the return code we are avoiding changing where things may fail and all the complications that might bring, so it is not a perfect fix. It's just a lot simpler than something more sophisticated like handling the special case where there are no free directories buy triggering a whole new confChanged()-style rescan in the hope that will bring back a directory or even disk volume. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org