[ 
https://issues.apache.org/jira/browse/STORM-339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14052518#comment-14052518
 ] 

Radim Kolar commented on STORM-339:
-----------------------------------

There are 3 methods for implementing protection against OOM without need to 
acknowledge every message. Storm in ack mode has 10x lower throughput.

See end of 
http://docs.jboss.org/hornetq/2.2.5.Final/user-manual/en/html/queue-attributes.html#queue-attributes.address-settings

1) use ring buffer for receiving messages. If messages are processed too slowly 
newly arriving message will replace older unprocessed message. This is not a 
flow control - just protection against OOM. (type DROP)

2) implement flow control messages, something simple like XON/XOFF protocol 
(http://en.wikipedia.org/wiki/Software_flow_control) should suffice (type BLOCK)

3) save messages to disk instead of throwing them away (type PAGE)

for inspiration see 
http://docs.jboss.org/hornetq/2.2.5.Final/user-manual/en/html/flow-control.html

> Severe memory leak to OOM when ackers disabled
> ----------------------------------------------
>
>                 Key: STORM-339
>                 URL: https://issues.apache.org/jira/browse/STORM-339
>             Project: Apache Storm (Incubating)
>          Issue Type: Bug
>    Affects Versions: 0.9.2-incubating
>            Reporter: Jiahong Li
>
> Without any ackers enabled, fast component  will continuously leak memory and 
> causing OOM problems when target component is slow. The OOM problem can be 
> reproduced by running this fast-slow-topology:
> https://github.com/Gvain/storm-perf-test/tree/fast-slow-topology
> with command:
> {code}
> $ storm jar storm_perf_test-1.0.0-SNAPSHOT-jar-with-dependencies.jar 
> com.yahoo.storm.perftest.Main --spout 1 --bolt 1 --workers 2 --testTime 600 
> --messageSize 6400
> {code}
> And the worker childopts with {{-Xms2g -Xmx2g -Xmn512m ...}}.
> At the same time, the executed count of target component is far behind from 
> the emitted count of source component.  I guess it could be that netty client 
> is buffering too much messages in its message_queue as target component sends 
> back OK/Failure Response too slowly. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to