[ 
https://issues.apache.org/jira/browse/RATIS-485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918138#comment-16918138
 ] 

Josh Elser commented on RATIS-485:
----------------------------------

{quote}Is the test creating a lot of RaftClient(s)? Each client has a 
TimeoutScheduler which may cause the OOM. Let's make the scheduler static to 
see if it could fix the OOM
{quote}
You were right on the money, Nicholas.

The test isn't explicitly creating new RaftClients (in fact, we only have a 
single RaftClientImpl in the heap dump), but the {{PeerProxyMap}} ends up 
creating a new {{GrpcClientProtocolClient}} inside 
{{PeerProxyMap#resetProxy(..)}} which ends up orphaning a TimeoutScheduler.

bq. I agree that we should not schedule another shutdown task when (1) there is 
shutdown task is already and (2) it is still valid.  When the previous shutdown 
task becomes invalid, we should cancel it.  Will work on a patch.

Actually, I have one here. One sec and I'll combine your patch :)

> Load Generator OOMs if Ratis Unavailable
> ----------------------------------------
>
>                 Key: RATIS-485
>                 URL: https://issues.apache.org/jira/browse/RATIS-485
>             Project: Ratis
>          Issue Type: Bug
>          Components: examples
>            Reporter: Clay B.
>            Priority: Trivial
>         Attachments: loadgen.log, r485_20190827.patch
>
>
> Running the load generator without a Ratis cluster (e.g. spurious node IPs) 
> results in an OOM.
> If one has a single Ratis server it tries seemingly indefinitely:
> {code:java}
> vagrant@ratis-server:~/incubator-ratis$ 
> ./ratis-examples/src/main/bin/client.sh filestore loadgen --size 1048576 
> --numFiles 100 --peers n0:127.0.0.1:1{code}
> If one has two Ratis servers it OOMs:
> {code:java}
> vagrant@ratis-server:~/incubator-ratis$ 
> ./ratis-examples/src/main/bin/client.sh filestore loadgen --size 1048576 
> --numFiles 100 --peers n0:127.0.0.1:1,n1:127.0.0.1:2
> [...]
> 1/787867107@5e5792a0 with java.util.concurrent.CompletionException: 
> java.io.IOException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> 2019-02-14 07:47:22 DEBUG RaftClient:417 - client-272A2E13A5DD: suggested new 
> leader: null. Failed 
> RaftClientRequest:client-272A2E13A5DD->n1@group-6F7570313233, cid=0, seq=0 
> RW, 
> org.apache.ratis.examples.filestore.FileStoreClient$$Lambda$41/787867107@5e5792a0
>  with java.io.IOException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> 2019-02-14 07:47:22 DEBUG RaftClient:437 - client-272A2E13A5DD: change Leader 
> from n1 to n0
> 2019-02-14 07:47:22 DEBUG RaftClient:291 - schedule attempt #10740 with 
> policy RetryForeverNoSleep for 
> RaftClientRequest:client-272A2E13A5DD->n1@group-6F7570313233, cid=0, seq=0 
> RW, 
> org.apache.ratis.examples.filestore.FileStoreClient$$Lambda$41/787867107@5e5792a0
> 2019-02-14 07:47:22 DEBUG RaftClient:323 - client-272A2E13A5DD: send* 
> RaftClientRequest:client-272A2E13A5DD->n0@group-6F7570313233, cid=0, seq=0 
> RW, 
> org.apache.ratis.examples.filestore.FileStoreClient$$Lambda$41/787867107@5e5792a0
> 2019-02-14 07:47:22 DEBUG RaftClient:338 - client-272A2E13A5DD: Failed 
> RaftClientRequest:client-272A2E13A5DD->n0@group-6F7570313233, cid=0, seq=0 
> RW, 
> org.apache.ratis.examples.filestore.FileStoreClient$$Lambda$41/787867107@5e5792a0
>  with java.util.concurrent.CompletionException: java.lang.OutOfMemoryError: 
> unable to create new native thread
> Exception in thread "main" java.util.concurrent.CompletionException: 
> java.lang.OutOfMemoryError: unable to create new native thread
>         at 
> org.apache.ratis.client.impl.RaftClientImpl.lambda$sendRequestAsync$14(RaftClientImpl.java:349)
>         at 
> java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
>         at 
> java.util.concurrent.CompletableFuture.uniExceptionallyStage(CompletableFuture.java:884)
>         at 
> java.util.concurrent.CompletableFuture.exceptionally(CompletableFuture.java:2196)
>         at 
> org.apache.ratis.client.impl.RaftClientImpl.sendRequestAsync(RaftClientImpl.java:334)
>         at 
> org.apache.ratis.client.impl.RaftClientImpl.sendRequestWithRetryAsync(RaftClientImpl.java:286)
>         at 
> org.apache.ratis.util.SlidingWindow$Client.sendOrDelayRequest(SlidingWindow.java:243)
>         at 
> org.apache.ratis.util.SlidingWindow$Client.retry(SlidingWindow.java:259)
>         at 
> org.apache.ratis.client.impl.RaftClientImpl.lambda$null$10(RaftClientImpl.java:293)
>         at 
> org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$0(TimeoutScheduler.java:85)
>         at 
> org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$1(TimeoutScheduler.java:104)
>         at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:50)
>         at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:91)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:717)
>         at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
>         at 
> java.util.concurrent.ThreadPoolExecutor.ensurePrestart(ThreadPoolExecutor.java:1603)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:334)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:533)
>         at 
> org.apache.ratis.util.TimeoutScheduler.schedule(TimeoutScheduler.java:117)
>         at 
> org.apache.ratis.util.TimeoutScheduler.onTimeout(TimeoutScheduler.java:104)
>         at 
> org.apache.ratis.util.TimeoutScheduler.onTimeout(TimeoutScheduler.java:82)
>         at 
> org.apache.ratis.util.TimeoutScheduler.onTimeout(TimeoutScheduler.java:134)
>         at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.onNext(GrpcClientProtocolClient.java:234)
>         at 
> org.apache.ratis.grpc.client.GrpcClientRpc.sendRequestAsync(GrpcClientRpc.java:71)
>         at 
> org.apache.ratis.client.impl.RaftClientImpl.sendRequestAsync(RaftClientImpl.java:324)
>         ... 15 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to