[ https://issues.apache.org/jira/browse/STORM-3523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16959017#comment-16959017 ]
Aaron Gresch commented on STORM-3523: ------------------------------------- It turns out that handling this exception exposed the root cause for us.... With the exception, the supervisor would restart and fix itself. Handling the exception, we sometimes would see workers failing to start. Investigating further, we found deadlocks in the AsyncLocalizer due to running with only 3 threads. This was just a setting we carried forward from our older pre-2.x clusters. I think it would be best to revert this change and add a comment regarding deadlock possibilities. It also appears that running the localizer this way was the actual root cause for STORM-3168. However, the change I made there still seems valid. > supervisor restarts when releasing slot with missing file > --------------------------------------------------------- > > Key: STORM-3523 > URL: https://issues.apache.org/jira/browse/STORM-3523 > Project: Apache Storm > Issue Type: Bug > Affects Versions: 2.0.0 > Reporter: Aaron Gresch > Assignee: Aaron Gresch > Priority: Minor > Labels: pull-request-available > Fix For: 2.2.0 > > Time Spent: 20m > Remaining Estimate: 0h > > {code:java} > 2019-10-03 16:25:32.809 o.a.s.d.s.Slot SLOT_6719 [ERROR] Error when > processing event > java.io.FileNotFoundException: File > 'x/storm/supervisor/stormdist/xxx-190213-004131-001-209-1550018519/stormconf.ser' > does not exist > at > org.apache.storm.shade.org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:297) > ~[shaded-deps-2.0.1.y.jar:2.0.1.y] > at > org.apache.storm.shade.org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1851) > ~[shaded-deps-2.0.1.y.jar:2.0.1.y] > at > org.apache.storm.utils.ConfigUtils.readSupervisorStormConfGivenPath(ConfigUtils.java:308) > ~[storm-client-2.0.1.y.jar:2.0.1.y] > at > org.apache.storm.utils.ConfigUtils.readSupervisorStormConfImpl(ConfigUtils.java:469) > ~[storm-client-2.0.1.y.jar:2.0.1.y] > at > org.apache.storm.utils.ConfigUtils.readSupervisorStormConf(ConfigUtils.java:303) > ~[storm-client-2.0.1.y.jar:2.0.1.y] > at > org.apache.storm.localizer.AsyncLocalizer.getLocalResources(AsyncLocalizer.java:359) > ~[storm-server-2.0.1.y.jar:2.0.1.y] > at > org.apache.storm.localizer.AsyncLocalizer.releaseSlotFor(AsyncLocalizer.java:460) > ~[storm-server-2.0.1.y.jar:2.0.1.y] > at > org.apache.storm.daemon.supervisor.Slot.handleWaitingForBlobLocalization(Slot.java:435) > ~[storm-server-2.0.1.y.jar:2.0.1.y] > at > org.apache.storm.daemon.supervisor.Slot.stateMachineStep(Slot.java:229) > ~[storm-server-2.0.1.y.jar:2.0.1.y] > at org.apache.storm.daemon.supervisor.Slot.run(Slot.java:900) > [storm-server-2.0.1.y.jar:2.0.1.y] > 2019-10-03 16:25:32.810 o.a.s.u.Utils SLOT_6719 [ERROR] Halting process: > Error when processing an event > java.lang.RuntimeException: Halting process: Error when processing an event > at org.apache.storm.utils.Utils.exitProcess(Utils.java:550) > [storm-client-2.0.1.y.jar:2.0.1.y] > at org.apache.storm.daemon.supervisor.Slot.run(Slot.java:947) > [storm-server-2.0.1.y.jar:2.0.1.y] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)