[ 
https://issues.apache.org/jira/browse/FLINK-5345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15751722#comment-15751722
 ] 

Stephan Ewen commented on FLINK-5345:
-------------------------------------

I think that is a problem of {{org.apache.commons.io.FileUtils}}: When someone 
concurrently works on the directory, the delete fails.

We should have our own utility method for recursive directory that retries 
listing and deleting contained files to be safe against concurrent deletes by 
other services.

> IOManager failed to properly clean up temp file directory
> ---------------------------------------------------------
>
>                 Key: FLINK-5345
>                 URL: https://issues.apache.org/jira/browse/FLINK-5345
>             Project: Flink
>          Issue Type: Bug
>    Affects Versions: 1.1.3
>            Reporter: Robert Metzger
>              Labels: simplex, starter
>
> While testing 1.1.3 RC3, I have the following message in my log:
> {code}
> 2016-12-15 14:46:05,450 INFO  
> org.apache.flink.streaming.runtime.tasks.StreamTask           - Timer service 
> is shutting down.
> 2016-12-15 14:46:05,452 INFO  org.apache.flink.runtime.taskmanager.Task       
>               - Source: control events generator (29/40) 
> (73915a232ba09e642f9dff92f8c8773a) switched from CANCELING to CANCELED.
> 2016-12-15 14:46:05,452 INFO  org.apache.flink.runtime.taskmanager.Task       
>               - Freeing task resources for Source: control events generator 
> (29/40) (73915a232ba09e642f9dff92f8c8773a).
> 2016-12-15 14:46:05,454 INFO  org.apache.flink.yarn.YarnTaskManager           
>               - Un-registering task and sending final execution state 
> CANCELED to JobManager for task Source: control events genera
> tor (73915a232ba09e642f9dff92f8c8773a)
> 2016-12-15 14:46:40,609 INFO  org.apache.flink.yarn.YarnTaskManagerRunner     
>               - RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested.
> 2016-12-15 14:46:40,611 INFO  org.apache.flink.runtime.blob.BlobCache         
>               - Shutting down BlobCache
> 2016-12-15 14:46:40,724 WARN  akka.remote.ReliableDeliverySupervisor          
>               - Association with remote system 
> [akka.tcp://[email protected]:33635] has failed, address is now gated for 
> [5000] ms.
>  Reason is: [Disassociated].
> 2016-12-15 14:46:40,808 ERROR 
> org.apache.flink.runtime.io.disk.iomanager.IOManager          - IOManager 
> failed to properly clean up temp file directory: 
> /yarn/nm/usercache/robert/appcache/application_148129128
> 9979_0024/flink-io-f0ff3f66-b9e2-4560-881f-2ab43bc448b5
> java.lang.IllegalArgumentException: 
> /yarn/nm/usercache/robert/appcache/application_1481291289979_0024/flink-io-f0ff3f66-b9e2-4560-881f-2ab43bc448b5/62e14e1891fe1e334c921dfd19a32a84/StreamMap_11_24/dummy_state
>  does not exist
>         at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1637)
>         at 
> org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535)
>         at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2270)
>         at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1653)
>         at 
> org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535)
>         at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2270)
>         at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1653)
>         at 
> org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535)
>         at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2270)
>         at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1653)
>         at 
> org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535)
>         at 
> org.apache.flink.runtime.io.disk.iomanager.IOManager.shutdown(IOManager.java:109)
>         at 
> org.apache.flink.runtime.io.disk.iomanager.IOManagerAsync.shutdown(IOManagerAsync.java:185)
>         at 
> org.apache.flink.runtime.io.disk.iomanager.IOManagerAsync$1.run(IOManagerAsync.java:105)
> {code}
> This was the last message logged from that machine. I suspect two threads are 
> trying to clean up the directories during shutdown?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to