[ 
https://issues.apache.org/jira/browse/HADOOP-18636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692119#comment-17692119
 ] 

ASF GitHub Bot commented on HADOOP-18636:
-----------------------------------------

steveloughran commented on PR #5412:
URL: https://github.com/apache/hadoop/pull/5412#issuecomment-1439803446

   Currently `LocalDirAllocator.getLocalPathForWrite()` will create all target 
dirs if absent on first invocation or if the config is changed via 
`confChanged()`
   
   On later calls, checkWrite is called after a dir has been selected. This 
method contains both parent dir recreation *and* explicit attempt to create a 
file under it.
   
   But the parent dir creation is too late for the disk space allocator, which 
will have already got 0 as the available space if a dir doesn't exist.
   
   What does this mean? It means that long-lived LocalDirAllocator instances 
cannot recover from the deletion of temp directories, even though checkWrite is 
meant to be able to do so. Calling mkdirs() earlier *and ignoring the result* 
means that if hadoop.tmp.dir has been deleted there will be a best effort 
attempt to recreate it before looking for free disk space. By not checking at 
the return code we are avoiding changing where things may fail and all the 
complications that might bring, so it is not a perfect fix. It's just a lot 
simpler than something more sophisticated like handling the special case where 
there are no free directories buy triggering a whole new confChanged()-style 
rescan in the hope that will bring back a directory or even disk volume.




> LocalDirAllocator cannot recover from directory tree deletion during the life 
> of a filesystem client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-18636
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18636
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs, fs/azure, fs/s3
>    Affects Versions: 3.3.4
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Minor
>              Labels: pull-request-available
>
> The  s3a and abfs clients use LocalDirAllocator for allocating files in local 
> (temporary) storage for buffering blocks to write, and, for the s3a staging 
> committer, files being staged. 
> When initialized (or when the configuration key value is updated) 
> LocalDirAllocator enumerates all directories in the list and calls 
> {{mkdirs()}} to create them.
> when you ask actually for a file, it will look for the parent dir, and will 
> again call {{mkdirs()}}. 
> But before it does that, it looks to see if the dir has any space...if not it 
> is excluded from the list of directories with room for data.
> And guess what: directories which don't exist report as having no space. So 
> they get excluded -the recreation code doesn't get a chance to run.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to