[
https://issues.apache.org/jira/browse/STORM-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13956647#comment-13956647
]
ASF GitHub Bot commented on STORM-12:
-------------------------------------
GitHub user revans2 opened a pull request:
https://github.com/apache/incubator-storm/pull/57
STORM-12 Reduce thread usage of Netty transport.
Makes the netty clients share a thread pool, and never block any of the
threads in the pool. All existing tests pass, so no new tests were added.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/revans2/incubator-storm netty-thread-usage
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-storm/pull/57.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #57
----
commit 94c4d4d9e6c4ce736141668c585818214a9d26cf
Author: Robert (Bobby) Evans <[email protected]>
Date: 2014-04-01T15:32:31Z
STORM-12 Reduce thread usage of Netty transport.
----
> Reduce Thread Usage of Netty Transport
> --------------------------------------
>
> Key: STORM-12
> URL: https://issues.apache.org/jira/browse/STORM-12
> Project: Apache Storm (Incubating)
> Issue Type: Improvement
> Reporter: Robert Joseph Evans
>
> When users start to create large topologies the storm netty messaging layer
> uses lots of threads. This has resulted in OOMs because the default ulimit
> on most linux distros is around 4000 processes. It looks like the messaging
> layer wants to have one thread per server it is connected to, so that means
> the total number of other workers in the System.
> For one particular case we saw.
> 1 (Curator delay thread)
> 1 (Curator Event Processor)
> 1 (Finalizer)
> 1 (GC???)
> 1 (Storm messaging recv thread asking netty for messages)
> 1 (Thread pool polling on a Synchronous queue???)
> 1 (ZK Connection)
> 1 (ZK epoll)
> 2 (???)
> 2 (Netty epoll)
> 6 (Timer Thread)
> 15 (Disruptor consume batches)
> 104 (Netty Thread pool taking messages to be sent)
> and this process was dieing with OOMs because it could not create any more
> netty threads.
> Looking at the code it appears that come from two different things. First
> The Client code is using it's own thread pool for each Client instead of
> sharing a thread pool, but also the protocol itself blocks the thread in
> takeMessages() if there are no messages to send.
> So we need to make the thread pool shared between all of the clients and
> modify the protocol so that takeMessages does not block. But with it not
> blocking we also need a way to have Client.send write directly to the Channel
> in some situations so that the messages still are sent.
--
This message was sent by Atlassian JIRA
(v6.2#6252)