Re: [Architecture] Fixing OOM issue in Message Broker

Srinath Perera Sun, 26 May 2013 19:32:49 -0700

Hi Malinga,

This sounds great, and fixes look very good! I will come over, and lets
discuss how we keep arch clean.


Very nice Job!! as you mention in other thread, now we need perf numbers.

--Srinath


On Sun, May 26, 2013 at 9:58 PM, Malinga Purnasiri <[email protected]>wrote:

> Hi,
>
> When I'm working on the OOM issue in the MB, I have found that there is
> some design imitations which will lead to it. Let me summaries my findings
> in point form.
>
>  ** Static Executor pool (org.wso2.andes.pool.AndesExecuter.java) issue,*
> Unfortunately MB had static executor pool where we submit all the
> runnables need to execute in parallel. Sooner when we send messages in a
> burst pool will get exhausted (with runnables) and which will lead to
> increase the internal queue backing the executor pool. Which will lead to
> OOM. This was observed from the heap dump analysis with MAT.
>
>       *Solution* : We need to group the runnables based on the task
> nature and send them to different executor pools.
>
>  ** The way we inserting data to the Cassandra.*
>  Currently most of the time we are creating mutators and execute on the
> fly. I have done some Benchmarks which simulate this situation. Following
> is small code block for that ...
>
> Sample ..
>     for (int i = 0; i < 1000000; i++) {
> Mutator<String> messageMutator = HFactory.createMutator(keyspaceOperator,
> stringSerializer);
>  messageMutator.addInsertion(..);
> messageMutator.execute();
>     }
>
> When we looping and doing the execution on the fly, after few thousands
> latency it takes very long time than we expected to execute the insertion.
>
>         According to the Cassandra documentations this will lead to
> another side effect of sending so many messages over network, which will
> exhaust network bandwidth too,
>
> *Solution : Solution will be operate in batch mode. ex : BatchMutation*
>
>   ** Message accumulation in LinkedBlockingQueue (observed from heap dump
> with MAT)*
> Inside the CassendraMessageStore we have used LinkedBlockingQueue to store
> the messages temporary till we insert messages to Cassandra. But in there
> we have huge bottleneck, producer to the queue inserting so fast but
> consumer end its very slow. So this will lead to increase the blocking
> queue and crate OOM.
>
> *Solution : Add BatchMutation mode at the consumer end to make the
> consumer fast. So Queue will have less messages in given time.*
>
>   ** PublishMessageWriter run method is serial execution*
> Inside the thread, we just take messages one by one and try to insert to
> Cassandra, but when we have burst of messages this will create a
> bottleneck. So we must introduce more parallelism to this.
>
>
> ---------
>
> Any ideas on this ?,
>
> Note : I have done those above code changes and now MB can take messages
> in a burst and work without OOM. But we need to design this and implement
> this in a production quality mode.
>
> --
> Malinga Pathmal,
> Technical Lead, WSO2, Inc. : http://wso2.com/
> Phone : (+94) 715335898
>



-- 
============================
Srinath Perera, Ph.D.
  Senior Software Architect, WSO2 Inc.
  Visiting Faculty, University of Moratuwa
  Member, Apache Software Foundation
  Research Scientist, Lanka Software Foundation
  Blog: http://srinathsview.blogspot.com/
  Photos: http://www.flickr.com/photos/hemapani/
 Phone: 0772360902

_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] Fixing OOM issue in Message Broker

Reply via email to