[
https://issues.apache.org/jira/browse/STORM-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917086#comment-16917086
]
Stig Rohde Døssing commented on STORM-3476:
-------------------------------------------
I think this is causing the supervisor to keep trying to download blobs, even
after they're removed from Nimbus. This is e.g. the case when you kill a
topology. The supervisor loops logs like
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
2019-08-27 22:25:31.509 o.a.s.l.AsyncLocalizer AsyncLocalizer Executor - 0
[WARN] Failed to download blob LOCAL TOPO BLOB TOPO_CODE test-1-1566937202 will
try again in 100 ms
org.apache.storm.generated.KeyNotFoundException: null
at
org.apache.storm.generated.Nimbus$getBlobMeta_result$getBlobMeta_resultStandardScheme.read(Nimbus.java:25919)
~[storm-client-2.1.1-SNAPSHOT.jar:2.1.1-SNAPSHOT]
at
org.apache.storm.generated.Nimbus$getBlobMeta_result$getBlobMeta_resultStandardScheme.read(Nimbus.java:25887)
~[storm-client-2.1.1-SNAPSHOT.jar:2.1.1-SNAPSHOT]
at
org.apache.storm.generated.Nimbus$getBlobMeta_result.read(Nimbus.java:25818)
~[storm-client-2.1.1-SNAPSHOT.jar:2.1.1-SNAPSHOT]
at
org.apache.storm.thrift.TServiceClient.receiveBase(TServiceClient.java:88)
~[storm-shaded-deps-2.1.1-SNAPSHOT.jar:2.1.1-SNAPSHOT]
at
org.apache.storm.generated.Nimbus$Client.recv_getBlobMeta(Nimbus.java:794)
~[storm-client-2.1.1-SNAPSHOT.jar:2.1.1-SNAPSHOT]
at
org.apache.storm.generated.Nimbus$Client.getBlobMeta(Nimbus.java:781)
~[storm-client-2.1.1-SNAPSHOT.jar:2.1.1-SNAPSHOT]
at
org.apache.storm.blobstore.NimbusBlobStore.getBlobMeta(NimbusBlobStore.java:85)
~[storm-client-2.1.1-SNAPSHOT.jar:2.1.1-SNAPSHOT]
at
org.apache.storm.localizer.LocallyCachedTopologyBlob.getRemoteVersion(LocallyCachedTopologyBlob.java:127)
~[storm-server-2.1.1-SNAPSHOT.jar:2.1.1-SNAPSHOT]
at
org.apache.storm.localizer.AsyncLocalizer.lambda$downloadOrUpdate$10(AsyncLocalizer.java:265)
~[storm-server-2.1.1-SNAPSHOT.jar:2.1.1-SNAPSHOT]
at
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
[?:1.8.0_144]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[?:1.8.0_144]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[?:1.8.0_144]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
[?:1.8.0_144]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
[?:1.8.0_144]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_144]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_144]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
> LocalizedResourceRetentionSet cleanup causing excessive load on Hadoop
> namenode
> -------------------------------------------------------------------------------
>
> Key: STORM-3476
> URL: https://issues.apache.org/jira/browse/STORM-3476
> Project: Apache Storm
> Issue Type: Improvement
> Affects Versions: 2.0.0
> Reporter: Aaron Gresch
> Assignee: Aaron Gresch
> Priority: Major
> Labels: pull-request-available
> Fix For: 2.1.0
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> One of our local dev Hadoop devs noticed our storm user was by far creating
> the heaviest load on our production Hadoop cluster. Looking at one of the
> heaviest supervisor nodes, and comparing debug logs to the Hadoop audit log,
> it looks like LocalizedResourceRetentionSet cleanup was constantly doing
> opens and never deleting any files.
>
> The frequency can be addressed by supervisor.localizer.cleanup.interval.ms,
> but even so, it seems we will continually look for files to delete even when
> the target size is acceptable, resulting in unnecessary calls to Hadoop.
>
>
--
This message was sent by Atlassian Jira
(v8.3.2#803003)