[
https://issues.apache.org/jira/browse/FLINK-5345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15751722#comment-15751722
]
Stephan Ewen commented on FLINK-5345:
-------------------------------------
I think that is a problem of {{org.apache.commons.io.FileUtils}}: When someone
concurrently works on the directory, the delete fails.
We should have our own utility method for recursive directory that retries
listing and deleting contained files to be safe against concurrent deletes by
other services.
> IOManager failed to properly clean up temp file directory
> ---------------------------------------------------------
>
> Key: FLINK-5345
> URL: https://issues.apache.org/jira/browse/FLINK-5345
> Project: Flink
> Issue Type: Bug
> Affects Versions: 1.1.3
> Reporter: Robert Metzger
> Labels: simplex, starter
>
> While testing 1.1.3 RC3, I have the following message in my log:
> {code}
> 2016-12-15 14:46:05,450 INFO
> org.apache.flink.streaming.runtime.tasks.StreamTask - Timer service
> is shutting down.
> 2016-12-15 14:46:05,452 INFO org.apache.flink.runtime.taskmanager.Task
> - Source: control events generator (29/40)
> (73915a232ba09e642f9dff92f8c8773a) switched from CANCELING to CANCELED.
> 2016-12-15 14:46:05,452 INFO org.apache.flink.runtime.taskmanager.Task
> - Freeing task resources for Source: control events generator
> (29/40) (73915a232ba09e642f9dff92f8c8773a).
> 2016-12-15 14:46:05,454 INFO org.apache.flink.yarn.YarnTaskManager
> - Un-registering task and sending final execution state
> CANCELED to JobManager for task Source: control events genera
> tor (73915a232ba09e642f9dff92f8c8773a)
> 2016-12-15 14:46:40,609 INFO org.apache.flink.yarn.YarnTaskManagerRunner
> - RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested.
> 2016-12-15 14:46:40,611 INFO org.apache.flink.runtime.blob.BlobCache
> - Shutting down BlobCache
> 2016-12-15 14:46:40,724 WARN akka.remote.ReliableDeliverySupervisor
> - Association with remote system
> [akka.tcp://[email protected]:33635] has failed, address is now gated for
> [5000] ms.
> Reason is: [Disassociated].
> 2016-12-15 14:46:40,808 ERROR
> org.apache.flink.runtime.io.disk.iomanager.IOManager - IOManager
> failed to properly clean up temp file directory:
> /yarn/nm/usercache/robert/appcache/application_148129128
> 9979_0024/flink-io-f0ff3f66-b9e2-4560-881f-2ab43bc448b5
> java.lang.IllegalArgumentException:
> /yarn/nm/usercache/robert/appcache/application_1481291289979_0024/flink-io-f0ff3f66-b9e2-4560-881f-2ab43bc448b5/62e14e1891fe1e334c921dfd19a32a84/StreamMap_11_24/dummy_state
> does not exist
> at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1637)
> at
> org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535)
> at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2270)
> at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1653)
> at
> org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535)
> at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2270)
> at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1653)
> at
> org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535)
> at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2270)
> at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1653)
> at
> org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535)
> at
> org.apache.flink.runtime.io.disk.iomanager.IOManager.shutdown(IOManager.java:109)
> at
> org.apache.flink.runtime.io.disk.iomanager.IOManagerAsync.shutdown(IOManagerAsync.java:185)
> at
> org.apache.flink.runtime.io.disk.iomanager.IOManagerAsync$1.run(IOManagerAsync.java:105)
> {code}
> This was the last message logged from that machine. I suspect two threads are
> trying to clean up the directories during shutdown?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)