[jira] [Commented] (STORM-3476) LocalizedResourceRetentionSet cleanup causing excessive load on Hadoop namenode

Jira Tue, 27 Aug 2019 13:33:12 -0700


    [ 
https://issues.apache.org/jira/browse/STORM-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917086#comment-16917086
 ]


Stig Rohde Døssing commented on STORM-3476:
-------------------------------------------

I think this is causing the supervisor to keep trying to download blobs, even 
after they're removed from Nimbus. This is e.g. the case when you kill a 
topology. The supervisor loops logs like 

        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
2019-08-27 22:25:31.509 o.a.s.l.AsyncLocalizer AsyncLocalizer Executor - 0 
[WARN] Failed to download blob LOCAL TOPO BLOB TOPO_CODE test-1-1566937202 will 
try again in 100 ms
org.apache.storm.generated.KeyNotFoundException: null
        at 
org.apache.storm.generated.Nimbus$getBlobMeta_result$getBlobMeta_resultStandardScheme.read(Nimbus.java:25919)
 ~[storm-client-2.1.1-SNAPSHOT.jar:2.1.1-SNAPSHOT]
        at 
org.apache.storm.generated.Nimbus$getBlobMeta_result$getBlobMeta_resultStandardScheme.read(Nimbus.java:25887)
 ~[storm-client-2.1.1-SNAPSHOT.jar:2.1.1-SNAPSHOT]
        at 
org.apache.storm.generated.Nimbus$getBlobMeta_result.read(Nimbus.java:25818) 
~[storm-client-2.1.1-SNAPSHOT.jar:2.1.1-SNAPSHOT]
        at 
org.apache.storm.thrift.TServiceClient.receiveBase(TServiceClient.java:88) 
~[storm-shaded-deps-2.1.1-SNAPSHOT.jar:2.1.1-SNAPSHOT]
        at 
org.apache.storm.generated.Nimbus$Client.recv_getBlobMeta(Nimbus.java:794) 
~[storm-client-2.1.1-SNAPSHOT.jar:2.1.1-SNAPSHOT]
        at 
org.apache.storm.generated.Nimbus$Client.getBlobMeta(Nimbus.java:781) 
~[storm-client-2.1.1-SNAPSHOT.jar:2.1.1-SNAPSHOT]
        at 
org.apache.storm.blobstore.NimbusBlobStore.getBlobMeta(NimbusBlobStore.java:85) 
~[storm-client-2.1.1-SNAPSHOT.jar:2.1.1-SNAPSHOT]
        at 
org.apache.storm.localizer.LocallyCachedTopologyBlob.getRemoteVersion(LocallyCachedTopologyBlob.java:127)
 ~[storm-server-2.1.1-SNAPSHOT.jar:2.1.1-SNAPSHOT]
        at 
org.apache.storm.localizer.AsyncLocalizer.lambda$downloadOrUpdate$10(AsyncLocalizer.java:265)
 ~[storm-server-2.1.1-SNAPSHOT.jar:2.1.1-SNAPSHOT]
        at 
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
 [?:1.8.0_144]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[?:1.8.0_144]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[?:1.8.0_144]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
 [?:1.8.0_144]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 [?:1.8.0_144]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_144]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_144]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]

> LocalizedResourceRetentionSet cleanup causing excessive load on Hadoop 
> namenode
> -------------------------------------------------------------------------------
>
>                 Key: STORM-3476
>                 URL: https://issues.apache.org/jira/browse/STORM-3476
>             Project: Apache Storm
>          Issue Type: Improvement
>    Affects Versions: 2.0.0
>            Reporter: Aaron Gresch
>            Assignee: Aaron Gresch
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.1.0
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> One of our local dev Hadoop devs noticed our storm user was by far creating 
> the heaviest load on our production Hadoop cluster.  Looking at one of the 
> heaviest supervisor nodes, and comparing debug logs to the Hadoop audit log, 
> it looks like LocalizedResourceRetentionSet cleanup was constantly doing 
> opens and never deleting any files.
>  
> The frequency can be addressed by supervisor.localizer.cleanup.interval.ms, 
> but even so, it seems we will continually look for files to delete even when 
> the target size is acceptable, resulting in unnecessary calls to Hadoop.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (STORM-3476) LocalizedResourceRetentionSet cleanup causing excessive load on Hadoop namenode

Reply via email to