[ https://issues.apache.org/jira/browse/STORM-2879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated STORM-2879: ---------------------------------- Labels: patch pull-request-available (was: patch) > Supervisor collapse continuously when there is a expired assignment for > overdue storm > ------------------------------------------------------------------------------------- > > Key: STORM-2879 > URL: https://issues.apache.org/jira/browse/STORM-2879 > Project: Apache Storm > Issue Type: Bug > Components: storm-core, storm-server > Affects Versions: 2.0.0, 1.x > Reporter: Yuzhao Chen > Priority: Critical > Labels: patch, pull-request-available > Fix For: 2.0.0 > > > For now, when a topology is reassigned or killed for a cluster, supervisor > will delete 4 files for an overdue storm: > - storm-code > - storm-ser > - storm-jar > - LocalAssignment > Slot.java > static DynamicState cleanupCurrentContainer(DynamicState dynamicState, > StaticState staticState, MachineState nextState) throws Exception { > assert(dynamicState.container != null); > assert(dynamicState.currentAssignment != null); > assert(dynamicState.container.areAllProcessesDead()); > > dynamicState.container.cleanUp(); > staticState.localizer.releaseSlotFor(dynamicState.currentAssignment, > staticState.port); > DynamicState ret = dynamicState.withCurrentAssignment(null, null); > if (nextState != null) { > ret = ret.withState(nextState); > } > return ret; > } > But we do not make a transaction to do this, if an exception occurred during > deleting storm-code/ser/jar, an overdue local assignment will be left on disk. > Then when supervisor restart from the exception above, the slots will be > initial and container will be recovered from LocalAssignments, the blob store > will fetch the files from Nimbus/Master, but will get a KeyNotFoundException, > and supervisor collapses again. > This will happens continuously and supervisor will never recover until we > clean up all the local assignments manually. > This is the stack: > 2017-12-27 14:15:04.434 o.a.s.l.AsyncLocalizer [INFO] Cleaning up unused > topologies in /opt/meituan/storm/data/supervisor/stormdist > 2017-12-27 14:15:04.434 o.a.s.d.s.AdvancedFSOps [INFO] Deleting path > /opt/meituan/storm/data/supervisor/stormdist/app_dpsr_realtime_shop_vane_allcates-14-1513685785 > 2017-12-27 14:15:04.445 o.a.s.d.s.Slot [INFO] STATE EMPTY msInState: 109 -> > WAITING_FOR_BASIC_LOCALIZATION msInState: 1 > 2017-12-27 14:15:04.471 o.a.s.d.s.Supervisor [INFO] Starting supervisor with > id 255d3fed-f3ee-4c7e-8a08-b693c9a6a072 at host gq-data-rt48.gq.sankuai.com. > 2017-12-27 14:15:04.502 o.a.s.u.Utils [ERROR] An exception happened while > downloading > /opt/meituan/storm/data/supervisor/tmp/ca4f8174-59be-40a4-b431-dbc8b697f063/stormjar.jar > from blob store. > org.apache.storm.generated.KeyNotFoundException: null > at > org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26656) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26624) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.generated.Nimbus$beginBlobDownload_result.read(Nimbus.java:26555) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.thrift.TServiceClient.receiveBase(TServiceClient.java:86) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.generated.Nimbus$Client.recv_beginBlobDownload(Nimbus.java:864) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.generated.Nimbus$Client.beginBlobDownload(Nimbus.java:851) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.blobstore.NimbusBlobStore.getBlob(NimbusBlobStore.java:357) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.utils.Utils.downloadResourcesAsSupervisorAttempt(Utils.java:598) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.utils.Utils.downloadResourcesAsSupervisorImpl(Utils.java:582) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.utils.Utils.downloadResourcesAsSupervisor(Utils.java:574) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.downloadBaseBlobs(AsyncLocalizer.java:123) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:148) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:101) > ~[storm-core-1.1.2-mt001.jar:?] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > ~[?:1.7.0_76] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [?:1.7.0_76] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [?:1.7.0_76] > at java.lang.Thread.run(Thread.java:745) [?:1.7.0_76] > 2017-12-27 14:15:04.611 o.a.s.u.Utils [ERROR] An exception happened while > downloading > /opt/meituan/storm/data/supervisor/tmp/ca4f8174-59be-40a4-b431-dbc8b697f063/stormjar.jar > from blob store. > org.apache.storm.generated.KeyNotFoundException: null > at > org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26656) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26624) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.generated.Nimbus$beginBlobDownload_result.read(Nimbus.java:26555) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.thrift.TServiceClient.receiveBase(TServiceClient.java:86) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.generated.Nimbus$Client.recv_beginBlobDownload(Nimbus.java:864) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.generated.Nimbus$Client.beginBlobDownload(Nimbus.java:851) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.blobstore.NimbusBlobStore.getBlob(NimbusBlobStore.java:357) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.utils.Utils.downloadResourcesAsSupervisorAttempt(Utils.java:598) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.utils.Utils.downloadResourcesAsSupervisorImpl(Utils.java:582) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.utils.Utils.downloadResourcesAsSupervisor(Utils.java:574) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.downloadBaseBlobs(AsyncLocalizer.java:123) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:148) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:101) > ~[storm-core-1.1.2-mt001.jar:?] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > ~[?:1.7.0_76] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [?:1.7.0_76] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [?:1.7.0_76] > at java.lang.Thread.run(Thread.java:745) [?:1.7.0_76] > 2017-12-27 14:15:04.718 o.a.s.u.Utils [ERROR] An exception happened while > downloading > /opt/meituan/storm/data/supervisor/tmp/ca4f8174-59be-40a4-b431-dbc8b697f063/stormcode.ser > from blob store. > org.apache.storm.generated.KeyNotFoundException: null > at > org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26656) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26624) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.generated.Nimbus$beginBlobDownload_result.read(Nimbus.java:26555) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.thrift.TServiceClient.receiveBase(TServiceClient.java:86) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.generated.Nimbus$Client.recv_beginBlobDownload(Nimbus.java:864) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.generated.Nimbus$Client.beginBlobDownload(Nimbus.java:851) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.blobstore.NimbusBlobStore.getBlob(NimbusBlobStore.java:357) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.utils.Utils.downloadResourcesAsSupervisorAttempt(Utils.java:598) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.utils.Utils.downloadResourcesAsSupervisorImpl(Utils.java:582) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.utils.Utils.downloadResourcesAsSupervisor(Utils.java:574) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.downloadBaseBlobs(AsyncLocalizer.java:124) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:148) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:101) > ~[storm-core-1.1.2-mt001.jar:?] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > ~[?:1.7.0_76] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [?:1.7.0_76] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [?:1.7.0_76] > at java.lang.Thread.run(Thread.java:745) [?:1.7.0_76] > 2017-12-27 14:15:04.825 o.a.s.u.Utils [ERROR] An exception happened while > downloading > /opt/meituan/storm/data/supervisor/tmp/ca4f8174-59be-40a4-b431-dbc8b697f063/stormcode.ser > from blob store. > org.apache.storm.generated.KeyNotFoundException: null > at > org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26656) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26624) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.generated.Nimbus$beginBlobDownload_result.read(Nimbus.java:26555) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.thrift.TServiceClient.receiveBase(TServiceClient.java:86) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.generated.Nimbus$Client.recv_beginBlobDownload(Nimbus.java:864) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.generated.Nimbus$Client.beginBlobDownload(Nimbus.java:851) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.blobstore.NimbusBlobStore.getBlob(NimbusBlobStore.java:357) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.utils.Utils.downloadResourcesAsSupervisorAttempt(Utils.java:598) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.utils.Utils.downloadResourcesAsSupervisorImpl(Utils.java:582) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.utils.Utils.downloadResourcesAsSupervisor(Utils.java:574) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.downloadBaseBlobs(AsyncLocalizer.java:124) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:148) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:101) > ~[storm-core-1.1.2-mt001.jar:?] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > ~[?:1.7.0_76] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [?:1.7.0_76] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [?:1.7.0_76] > at java.lang.Thread.run(Thread.java:745) [?:1.7.0_76] > 2017-12-27 14:15:04.932 o.a.s.u.Utils [ERROR] An exception happened while > downloading > /opt/meituan/storm/data/supervisor/tmp/ca4f8174-59be-40a4-b431-dbc8b697f063/stormconf.ser > from blob store. > org.apache.storm.generated.KeyNotFoundException: null > at > org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26656) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26624) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.generated.Nimbus$beginBlobDownload_result.read(Nimbus.java:26555) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.thrift.TServiceClient.receiveBase(TServiceClient.java:86) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.generated.Nimbus$Client.recv_beginBlobDownload(Nimbus.java:864) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.generated.Nimbus$Client.beginBlobDownload(Nimbus.java:851) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.blobstore.NimbusBlobStore.getBlob(NimbusBlobStore.java:357) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.utils.Utils.downloadResourcesAsSupervisorAttempt(Utils.java:598) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.utils.Utils.downloadResourcesAsSupervisorImpl(Utils.java:582) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.utils.Utils.downloadResourcesAsSupervisor(Utils.java:574) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.downloadBaseBlobs(AsyncLocalizer.java:125) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:148) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:101) > ~[storm-core-1.1.2-mt001.jar:?] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > ~[?:1.7.0_76] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [?:1.7.0_76] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [?:1.7.0_76] > at java.lang.Thread.run(Thread.java:745) [?:1.7.0_76] > 2017-12-27 14:15:05.039 o.a.s.u.Utils [ERROR] An exception happened while > downloading > /opt/meituan/storm/data/supervisor/tmp/ca4f8174-59be-40a4-b431-dbc8b697f063/stormconf.ser > from blob store. > org.apache.storm.generated.KeyNotFoundException: null > at > org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26656) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.generated.Nimbus$beginBlobDownload_result$beginBlobDownload_resultStandardScheme.read(Nimbus.java:26624) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.generated.Nimbus$beginBlobDownload_result.read(Nimbus.java:26555) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.thrift.TServiceClient.receiveBase(TServiceClient.java:86) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.generated.Nimbus$Client.recv_beginBlobDownload(Nimbus.java:864) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.generated.Nimbus$Client.beginBlobDownload(Nimbus.java:851) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.blobstore.NimbusBlobStore.getBlob(NimbusBlobStore.java:357) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.utils.Utils.downloadResourcesAsSupervisorAttempt(Utils.java:598) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.utils.Utils.downloadResourcesAsSupervisorImpl(Utils.java:582) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.utils.Utils.downloadResourcesAsSupervisor(Utils.java:574) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.downloadBaseBlobs(AsyncLocalizer.java:125) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:148) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:101) > ~[storm-core-1.1.2-mt001.jar:?] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > ~[?:1.7.0_76] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [?:1.7.0_76] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [?:1.7.0_76] > at java.lang.Thread.run(Thread.java:745) [?:1.7.0_76] > 2017-12-27 14:15:05.140 o.a.s.u.Utils [INFO] Could not extract resources from > /opt/meituan/storm/data/supervisor/tmp/ca4f8174-59be-40a4-b431-dbc8b697f063/stormjar.jar > 2017-12-27 14:15:05.142 o.a.s.d.s.Slot [INFO] STATE > WAITING_FOR_BASIC_LOCALIZATION msInState: 697 -> > WAITING_FOR_BLOB_LOCALIZATION msInState: 0 > 2017-12-27 14:15:05.142 o.a.s.l.AsyncLocalizer [WARN] Caught Exception While > Downloading (rethrowing)... > java.io.FileNotFoundException: File > '/opt/meituan/storm/data/supervisor/stormdist/app_dpsr_realtime_shop_vane_allcates-14-1513685785/stormconf.ser' > does not exist > at > org.apache.storm.shade.org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:292) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.shade.org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1815) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.utils.ConfigUtils.readSupervisorStormConfGivenPath(ConfigUtils.java:264) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.utils.ConfigUtils.readSupervisorStormConfImpl(ConfigUtils.java:376) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.utils.ConfigUtils.readSupervisorStormConf(ConfigUtils.java:370) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.localizer.AsyncLocalizer$DownloadBlobs.call(AsyncLocalizer.java:226) > ~[storm-core-1.1.2-mt001.jar:?] > at > org.apache.storm.localizer.AsyncLocalizer$DownloadBlobs.call(AsyncLocalizer.java:213) > ~[storm-core-1.1.2-mt001.jar:?] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > ~[?:1.7.0_76] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [?:1.7.0_76] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [?:1.7.0_76] > at java.lang.Thread.run(Thread.java:745) [?:1.7.0_76] -- This message was sent by Atlassian JIRA (v6.4.14#64029)