[ https://issues.apache.org/jira/browse/STORM-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617296#comment-16617296 ]
michel hummel commented on STORM-2138: -------------------------------------- This issue still exists in V1.2.2. I think there is a race condition when a topology is stopping: the supervisor is removing the local files: {noformat} 2018-09-14 21:33:03.718 o.a.s.d.s.Container [INFO] Cleaning up 5485e983-34a9-46ad-8679-bbe79c110cba:32173038-42d8-40c7-94ce-9229e3177b6e 2018-09-14 21:33:03.718 o.a.s.d.s.Container [INFO] GET worker-user for 32173038-42d8-40c7-94ce-9229e3177b6e 2018-09-14 21:33:03.718 o.a.s.d.s.AdvancedFSOps [INFO] Deleting path /space/StormData/storm-data/workers/32173038-42d8-40c7-94ce-9229e3177b6e/pids/69074 2018-09-14 21:33:03.718 o.a.s.d.s.AdvancedFSOps [INFO] Deleting path /space/StormData/storm-data/workers/32173038-42d8-40c7-94ce-9229e3177b6e/heartbeats 2018-09-14 21:33:03.730 o.a.s.d.s.AdvancedFSOps [INFO] Deleting path /space/StormData/storm-data/workers/32173038-42d8-40c7-94ce-9229e3177b6e/pids 2018-09-14 21:33:03.731 o.a.s.d.s.AdvancedFSOps [INFO] Deleting path /space/StormData/storm-data/workers/32173038-42d8-40c7-94ce-9229e3177b6e/tmp 2018-09-14 21:33:03.731 o.a.s.d.s.AdvancedFSOps [INFO] Deleting path /space/StormData/storm-data/workers/32173038-42d8-40c7-94ce-9229e3177b6e 2018-09-14 21:33:03.732 o.a.s.d.s.Container [INFO] REMOVE worker-user 32173038-42d8-40c7-94ce-9229e3177b6e 2018-09-14 21:33:03.732 o.a.s.d.s.AdvancedFSOps [INFO] Deleting path /space/StormData/storm-data/workers-users/32173038-42d8-40c7-94ce-9229e3177b6e 2018-09-14 21:33:03.734 o.a.s.d.s.BasicContainer [INFO] Removed Worker ID 32173038-42d8-40c7-94ce-9229e3177b6e 2018-09-14 21:33:03.734 o.a.s.l.AsyncLocalizer [INFO] Released blob reference ipc-fci-nrt_Tsit-RCFD_2017100_02222018-09-14T21-22-51-429Z-100-1536960172 6702 Cleaning up BLOB references... 2018-09-14 21:33:03.735 o.a.s.l.AsyncLocalizer [INFO] Released blob reference ipc-fci-nrt_Tsit-RCFD_2017100_02222018-09-14T21-22-51-429Z-100-1536960172 6702 Cleaning up basic files... 2018-09-14 21:33:03.735 o.a.s.d.s.AdvancedFSOps [INFO] Deleting path /space/StormData/storm-data/supervisor/stormdist/ipc-fci-nrt_Tsit-RCFD_2017100_02222018-09-14T21-22-51-429Z-100-1536960172 {noformat} and if the blobUpdateTimer is trigged at the same time [https://github.com/apache/storm/blob/v1.2.2/storm-core/src/jvm/org/apache/storm/daemon/supervisor/timer/UpdateBlobs.java] it will search for the topology configuration which doesn't exists anymore: {noformat} 2018-09-14 21:33:03.796 o.a.s.e.EventManagerImp [ERROR] {} Error when processing event java.lang.RuntimeException: java.io.FileNotFoundException: File '/space/StormData/storm-data/supervisor/stormdist/ipc-fci-nrt_Tsit-RCFD_2017100_02222018-09-14T21-22-51-429Z-100-1536960172/stormconf.ser' does not exist at org.apache.storm.utils.Utils.wrapInRuntime(Utils.java:1571) ~[storm-core-1.2.2.jar:1.2.2] at org.apache.storm.daemon.supervisor.timer.UpdateBlobs.run(UpdateBlobs.java:86) ~[storm-core-1.2.2.jar:1.2.2] at org.apache.storm.event.EventManagerImp$1.run(EventManagerImp.java:54) [storm-core-1.2.2.jar:1.2.2] Caused by: java.io.FileNotFoundException: File '/space/StormData/storm-data/supervisor/stormdist/ipc-fci-nrt_Tsit-RCFD_2017100_02222018-09-14T21-22-51-429Z-100-1536960172/stormconf.ser' does not exist at org.apache.storm.shade.org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:292) ~[storm-core-1.2.2.jar:1.2.2] at org.apache.storm.shade.org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1815) ~[storm-core-1.2.2.jar:1.2.2] at org.apache.storm.utils.ConfigUtils.readSupervisorStormConfGivenPath(ConfigUtils.java:262) ~[storm-core-1.2.2.jar:1.2.2] at org.apache.storm.utils.ConfigUtils.readSupervisorStormConfImpl(ConfigUtils.java:374) ~[storm-core-1.2.2.jar:1.2.2] at org.apache.storm.utils.ConfigUtils.readSupervisorStormConf(ConfigUtils.java:368) ~[storm-core-1.2.2.jar:1.2.2] at org.apache.storm.daemon.supervisor.timer.UpdateBlobs.updateBlobsForTopology(UpdateBlobs.java:100) ~[storm-core-1.2.2.jar:1.2.2] at org.apache.storm.daemon.supervisor.timer.UpdateBlobs.run(UpdateBlobs.java:76) ~[storm-core-1.2.2.jar:1.2.2] ... 1 more {noformat} I think that dealing with this race condition is hard without doing a lock may-be adding a catch on FileNotFoundException could be an acceptable fix ? > java.io.FileNotFoundException: stormconf.ser does not exist > ----------------------------------------------------------- > > Key: STORM-2138 > URL: https://issues.apache.org/jira/browse/STORM-2138 > Project: Apache Storm > Issue Type: Bug > Components: storm-core > Affects Versions: 1.0.2 > Reporter: Eddy > Priority: Major > > We are seeing problems in our storm topology whereby all our workers crash. > The errors we see are > 2016-10-07 09:49:33.599 o.a.s.d.supervisor [ERROR] Error on initialization of > server mk-supervisor > java.io.FileNotFoundException: File > '/opt/storm_local/supervisor/stormdist/production_2016_09_13-1-1475831938/stormconf.ser' > does not exist > at > org.apache.storm.shade.org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:292) > at > org.apache.storm.shade.org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1815) > at > org.apache.storm.config$read_supervisor_storm_conf_given_path.invoke(config.clj:142) > at > org.apache.storm.config$read_supervisor_storm_conf.invoke(config.clj:221) > at > org.apache.storm.daemon.supervisor$add_blob_references.invoke(supervisor.clj:495) > at > org.apache.storm.daemon.supervisor$fn__9307$exec_fn__2466__auto____9308.invoke(supervisor.clj:795) > at clojure.lang.AFn.applyToHelper(AFn.java:160) > at clojure.lang.AFn.applyTo(AFn.java:144) > at clojure.core$apply.invoke(core.clj:630) > at > org.apache.storm.daemon.supervisor$fn__9307$mk_supervisor__9352.doInvoke(supervisor.clj:763) > at clojure.lang.RestFn.invoke(RestFn.java:436) > at > org.apache.storm.daemon.supervisor$_launch.invoke(supervisor.clj:1200) > at > org.apache.storm.daemon.supervisor$_main.invoke(supervisor.clj:1233) > at clojure.lang.AFn.applyToHelper(AFn.java:152) > at clojure.lang.AFn.applyTo(AFn.java:144) > at org.apache.storm.daemon.supervisor.main(Unknown Source) > 2016-10-07 09:49:33.608 o.a.s.util [ERROR] Halting process: ("Error on > initialization") > java.lang.RuntimeException: ("Error on initialization") > at org.apache.storm.util$exit_process_BANG_.doInvoke(util.clj:341) > at clojure.lang.RestFn.invoke(RestFn.java:423) > at > org.apache.storm.daemon.supervisor$fn__9307$mk_supervisor__9352.doInvoke(supervisor.clj:763) > at clojure.lang.RestFn.invoke(RestFn.java:436) > at > org.apache.storm.daemon.supervisor$_launch.invoke(supervisor.clj:1200) > at > org.apache.storm.daemon.supervisor$_main.invoke(supervisor.clj:1233) > at clojure.lang.AFn.applyToHelper(AFn.java:152) > at clojure.lang.AFn.applyTo(AFn.java:144) > at org.apache.storm.daemon.supervisor.main(Unknown Source) > 2016-10-07 09:49:34.668 o.a.s.d.supervisor [INFO] Removing code for storm id > production_2016_09_13-1-1475831938 > We have looked at https://github.com/apache/storm/pull/418 and > https://issues.apache.org/jira/browse/STORM-130, which both show the first > issue as being fixed - however we are still experiencing it in 1.0.2. The > changes from the fixing commit > (https://github.com/apache/storm/pull/418/commits/ccd28f8a356f468e66865fa9d9901b0a2628ec74) > don't seem to be in the current version of the file > (https://github.com/apache/storm/blob/v1.0.2/storm-core/src/clj/org/apache/storm/daemon/supervisor.clj). > We get this often when resubmitting a topology, and our only workaround is to > stop the topology, delete the whole /opt/storm_local directory (which is our > storm.local.dir) and resubmit the topology. Often, the workers seem to be > looking for stormconf.ser in the local directory of an old topology that > isn't even running at the time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)