[ 
https://issues.apache.org/jira/browse/STORM-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617296#comment-16617296
 ] 

michel hummel commented on STORM-2138:
--------------------------------------

This issue still exists in V1.2.2.

I think there is a race condition when a topology is stopping:

the supervisor is removing the local files:

 
{noformat}
2018-09-14 21:33:03.718 o.a.s.d.s.Container [INFO] Cleaning up 
5485e983-34a9-46ad-8679-bbe79c110cba:32173038-42d8-40c7-94ce-9229e3177b6e
2018-09-14 21:33:03.718 o.a.s.d.s.Container [INFO] GET worker-user for 
32173038-42d8-40c7-94ce-9229e3177b6e
2018-09-14 21:33:03.718 o.a.s.d.s.AdvancedFSOps [INFO] Deleting path 
/space/StormData/storm-data/workers/32173038-42d8-40c7-94ce-9229e3177b6e/pids/69074
2018-09-14 21:33:03.718 o.a.s.d.s.AdvancedFSOps [INFO] Deleting path 
/space/StormData/storm-data/workers/32173038-42d8-40c7-94ce-9229e3177b6e/heartbeats
2018-09-14 21:33:03.730 o.a.s.d.s.AdvancedFSOps [INFO] Deleting path 
/space/StormData/storm-data/workers/32173038-42d8-40c7-94ce-9229e3177b6e/pids
2018-09-14 21:33:03.731 o.a.s.d.s.AdvancedFSOps [INFO] Deleting path 
/space/StormData/storm-data/workers/32173038-42d8-40c7-94ce-9229e3177b6e/tmp
2018-09-14 21:33:03.731 o.a.s.d.s.AdvancedFSOps [INFO] Deleting path 
/space/StormData/storm-data/workers/32173038-42d8-40c7-94ce-9229e3177b6e
2018-09-14 21:33:03.732 o.a.s.d.s.Container [INFO] REMOVE worker-user 
32173038-42d8-40c7-94ce-9229e3177b6e
2018-09-14 21:33:03.732 o.a.s.d.s.AdvancedFSOps [INFO] Deleting path 
/space/StormData/storm-data/workers-users/32173038-42d8-40c7-94ce-9229e3177b6e
2018-09-14 21:33:03.734 o.a.s.d.s.BasicContainer [INFO] Removed Worker ID 
32173038-42d8-40c7-94ce-9229e3177b6e
2018-09-14 21:33:03.734 o.a.s.l.AsyncLocalizer [INFO] Released blob reference 
ipc-fci-nrt_Tsit-RCFD_2017100_02222018-09-14T21-22-51-429Z-100-1536960172 6702 
Cleaning up BLOB references...
2018-09-14 21:33:03.735 o.a.s.l.AsyncLocalizer [INFO] Released blob reference 
ipc-fci-nrt_Tsit-RCFD_2017100_02222018-09-14T21-22-51-429Z-100-1536960172 6702 
Cleaning up basic files...
2018-09-14 21:33:03.735 o.a.s.d.s.AdvancedFSOps [INFO] Deleting path 
/space/StormData/storm-data/supervisor/stormdist/ipc-fci-nrt_Tsit-RCFD_2017100_02222018-09-14T21-22-51-429Z-100-1536960172
{noformat}
and if the blobUpdateTimer is trigged at the same time

 

[https://github.com/apache/storm/blob/v1.2.2/storm-core/src/jvm/org/apache/storm/daemon/supervisor/timer/UpdateBlobs.java]
 

it will search for the topology configuration which doesn't exists anymore:

 
{noformat}
2018-09-14 21:33:03.796 o.a.s.e.EventManagerImp [ERROR] {} Error when 
processing event
java.lang.RuntimeException: java.io.FileNotFoundException: File 
'/space/StormData/storm-data/supervisor/stormdist/ipc-fci-nrt_Tsit-RCFD_2017100_02222018-09-14T21-22-51-429Z-100-1536960172/stormconf.ser'
 does not exist
at org.apache.storm.utils.Utils.wrapInRuntime(Utils.java:1571) 
~[storm-core-1.2.2.jar:1.2.2]
at 
org.apache.storm.daemon.supervisor.timer.UpdateBlobs.run(UpdateBlobs.java:86) 
~[storm-core-1.2.2.jar:1.2.2]
at org.apache.storm.event.EventManagerImp$1.run(EventManagerImp.java:54) 
[storm-core-1.2.2.jar:1.2.2]
Caused by: java.io.FileNotFoundException: File 
'/space/StormData/storm-data/supervisor/stormdist/ipc-fci-nrt_Tsit-RCFD_2017100_02222018-09-14T21-22-51-429Z-100-1536960172/stormconf.ser'
 does not exist
at 
org.apache.storm.shade.org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:292)
 ~[storm-core-1.2.2.jar:1.2.2]
at 
org.apache.storm.shade.org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1815)
 ~[storm-core-1.2.2.jar:1.2.2]
at 
org.apache.storm.utils.ConfigUtils.readSupervisorStormConfGivenPath(ConfigUtils.java:262)
 ~[storm-core-1.2.2.jar:1.2.2]
at 
org.apache.storm.utils.ConfigUtils.readSupervisorStormConfImpl(ConfigUtils.java:374)
 ~[storm-core-1.2.2.jar:1.2.2]
at 
org.apache.storm.utils.ConfigUtils.readSupervisorStormConf(ConfigUtils.java:368)
 ~[storm-core-1.2.2.jar:1.2.2]
at 
org.apache.storm.daemon.supervisor.timer.UpdateBlobs.updateBlobsForTopology(UpdateBlobs.java:100)
 ~[storm-core-1.2.2.jar:1.2.2]
at 
org.apache.storm.daemon.supervisor.timer.UpdateBlobs.run(UpdateBlobs.java:76) 
~[storm-core-1.2.2.jar:1.2.2]
... 1 more
{noformat}
I think that dealing with this race condition is hard without doing a lock 
may-be adding a catch on FileNotFoundException could be an acceptable fix ?

 

> java.io.FileNotFoundException: stormconf.ser does not exist
> -----------------------------------------------------------
>
>                 Key: STORM-2138
>                 URL: https://issues.apache.org/jira/browse/STORM-2138
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-core
>    Affects Versions: 1.0.2
>            Reporter: Eddy
>            Priority: Major
>
> We are seeing problems in our storm topology whereby all our workers crash.
> The errors we see are
> 2016-10-07 09:49:33.599 o.a.s.d.supervisor [ERROR] Error on initialization of 
> server mk-supervisor
> java.io.FileNotFoundException: File 
> '/opt/storm_local/supervisor/stormdist/production_2016_09_13-1-1475831938/stormconf.ser'
>  does not exist
>         at 
> org.apache.storm.shade.org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:292)
>         at 
> org.apache.storm.shade.org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1815)
>         at 
> org.apache.storm.config$read_supervisor_storm_conf_given_path.invoke(config.clj:142)
>         at 
> org.apache.storm.config$read_supervisor_storm_conf.invoke(config.clj:221)
>         at 
> org.apache.storm.daemon.supervisor$add_blob_references.invoke(supervisor.clj:495)
>         at 
> org.apache.storm.daemon.supervisor$fn__9307$exec_fn__2466__auto____9308.invoke(supervisor.clj:795)
>         at clojure.lang.AFn.applyToHelper(AFn.java:160)
>         at clojure.lang.AFn.applyTo(AFn.java:144)
>         at clojure.core$apply.invoke(core.clj:630)
>         at 
> org.apache.storm.daemon.supervisor$fn__9307$mk_supervisor__9352.doInvoke(supervisor.clj:763)
>         at clojure.lang.RestFn.invoke(RestFn.java:436)
>         at 
> org.apache.storm.daemon.supervisor$_launch.invoke(supervisor.clj:1200)
>         at 
> org.apache.storm.daemon.supervisor$_main.invoke(supervisor.clj:1233)
>         at clojure.lang.AFn.applyToHelper(AFn.java:152)
>         at clojure.lang.AFn.applyTo(AFn.java:144)
>         at org.apache.storm.daemon.supervisor.main(Unknown Source)
> 2016-10-07 09:49:33.608 o.a.s.util [ERROR] Halting process: ("Error on 
> initialization")
> java.lang.RuntimeException: ("Error on initialization")
>         at org.apache.storm.util$exit_process_BANG_.doInvoke(util.clj:341)
>         at clojure.lang.RestFn.invoke(RestFn.java:423)
>         at 
> org.apache.storm.daemon.supervisor$fn__9307$mk_supervisor__9352.doInvoke(supervisor.clj:763)
>         at clojure.lang.RestFn.invoke(RestFn.java:436)
>         at 
> org.apache.storm.daemon.supervisor$_launch.invoke(supervisor.clj:1200)
>         at 
> org.apache.storm.daemon.supervisor$_main.invoke(supervisor.clj:1233)
>         at clojure.lang.AFn.applyToHelper(AFn.java:152)
>         at clojure.lang.AFn.applyTo(AFn.java:144)
>         at org.apache.storm.daemon.supervisor.main(Unknown Source)
> 2016-10-07 09:49:34.668 o.a.s.d.supervisor [INFO] Removing code for storm id 
> production_2016_09_13-1-1475831938
> We have looked at https://github.com/apache/storm/pull/418 and 
> https://issues.apache.org/jira/browse/STORM-130, which both show the first 
> issue as being fixed - however we are still experiencing it in 1.0.2. The 
> changes from the fixing commit 
> (https://github.com/apache/storm/pull/418/commits/ccd28f8a356f468e66865fa9d9901b0a2628ec74)
>  don't seem to be in the current version of the file 
> (https://github.com/apache/storm/blob/v1.0.2/storm-core/src/clj/org/apache/storm/daemon/supervisor.clj).
> We get this often when resubmitting a topology, and our only workaround is to 
> stop the topology, delete the whole /opt/storm_local directory (which is our 
> storm.local.dir) and resubmit the topology. Often, the workers seem to be 
> looking for stormconf.ser in the local directory of an old topology that 
> isn't even running at the time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to