This is a copy of a post I put up on 
Stackoverflow<http://stackoverflow.com/questions/23403335/akka-pulling-pattern-vs-durable-mailboxes>
.

I've been working on a project of mine using Akka to create a real-time 
processing system which takes in the Twitter stream (for now) and uses 
actors to process said messages in various ways. I've been reading about 
similar architectures that others have built using Akka and this particular 
blog post caught my eye:

http://blog.goconspire.com/post/64901258135/akka-at-conspire-part-5-the-importance-of-pulling

Here they explain different issues that arise when pushing work (ie. 
messages) to actors vs. having the actors pull work. To paraphrase the 
article, by pushing messages there is no built-in way to know which units 
of work were received by which worker, nor can that be reliably tracked. In 
addition, if a worker suddenly receives a large number of messages where 
each message is quite large you might end up overwhelmed and the machine 
could run out of memory. Or, if the processing is CPU intensive you could 
render your node unresponsive due to CPU thrashing. Furthermore, if the jvm 
crashes, you will lose all the messages that the actor(s) had in its 
mailbox.

Pulling messages largely eliminates these problems. Since a specific actor 
must pull work from a coordinator, the coordinator always knows which unit 
of work each worker has; if a worker dies, the coordinator knows which unit 
of work to re-process. Messages also don’t sit in the workers’ mailboxes 
(since it's pulling a single message and processing it before pulling 
another one) so the loss of those mailboxes if the actor crashes isn't an 
issue. Furthermore, since each worker will only request more work once it 
completes its current task, there are no concerns about a worker receiving 
or starting more work than it can handle concurrently. Obviously there are 
also issues with this solution like what happens when the coordinator 
itself crashes but for now let's assume this is a non-issue. More about 
this pulling pattern can also be found at the "Let It Crash" website which 
the blog references:

http://letitcrash.com/post/29044669086/balancing-workload-across-nodes-with-akka-2

This got me thinking about a possible alternative to doing this pulling 
pattern which is to do pushing but with durable mailboxes. An example I was 
thinking of was implementing a mailbox that used RabbitMQ (other data 
stores like Redis, MongoDB, Kafka, etc would also work here) and then 
having each router of actors (all of which would be used for the same 
purpose) share the same message queue (or the same 
DB/collection/etc...depending on the data store used). In other words each 
router would have its own queue in RabbitMQ serving as a mailbox. This way, 
if one of the routees goes down, those that are still up can simply keep 
retrieving from RabbitMQ without too much worry that the queue will 
overflow since they are no longer using typical in-memory mailboxes. Also 
since their mailbox isn't implemented in-memory, if a routee crashes, the 
most messages that it could lose would just be the single one it was 
processing before the crash. If the whole router goes down then you could 
expect RabbitMQ (or whatever data store is being used) to handle an 
increased load until the router is able to recover and start processing 
messages again.

In terms of durable mailboxes, it seems that back in version 2.0, Akka was 
gravitating towards supporting these more actively since they had 
implemented a few that could work with MongoDB, ZooKeeper, etc. However, it 
seems that for whatever reason they abandoned the idea at some point since 
the latest version (2.3.2 as of the writing of this post) deprecated them. 
You're still able to implement your own mailbox by implementing the 
MessageQueue interface which gives you methods like enqueue(), dequeue(), 
etc... so making one that works with RabbitMQ, MongoDB, Redis, etc wouldn't 
seem to be a problem.

Anyways, just wanted to get your guys' and gals' thoughts on this. Does 
this seem like a viable alternative to doing pulling?

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Reply via email to