On 20/12/2013 18:38, Sridhar Sreenivasan wrote:
Hi Rory,
Thanks for the response. It clarifies some questions. But I have
more based on your response-
Hi Sridhar,
You are welcome, here goes.
"it would be easy to pause the poller if it is getting too far ahead
of the workers. "
How can this be accomplished?
MasterPoller (MP) periodically sending scan messages to RegionalPollers
(RP).
Each RP will then generate one or more queries against some remote
resource (e.g. ftp server) and generates a list of files to process. I
would do this with a future (risky operation that might fail). The
future sends a FileList(files) back to the RP. The RP can then add the
work to a queue which is given out to an available actor.
Idle actors are cheap and having a fixed size pool allows you to control
the maximum paralization (to avoid starvation elsewhere).
Last time I did something similar I created a fixed set of children -
each would request work from the RP (being put into an available workers
queue if there is no work).
This means you have to manage two list - the list of outstanding work
and the list of free children, and you can easily make decisions over
what to do when, for example, the work queue gets too large (you are too
far behind), e.g. you might want to drop the older items (or reject the
new request from the MP with a notification.
The other option is that the RP generates a future for each work task -
and the future sends the result back to the RP when its complete.
I understand that the actors created as part of another actor will be
children. But question is what will be the best pattern to create the
actors?
1. In "preStart"-
If we create N instances of actors (using
SmallestMailBoxRouter), then for the duration of the actor system,
there can be upto N instances, right? So for every polling period, the
number of instances of the worker actor will be upto 'N'.
I personally have avoided the routers as I have wanted more explicit
control (e.g. dropping) when the message backlog gets to great. If your
RP generates millions of files you going to generate huge mailboxes if
the workers do not keep up.
2. In "receive"
Creating N instances of actors per message received. So for
every polling period, we are going to be incrementing the number of
actors.
You can shut down the actors when they have finished.
If I stop the worker actor after the file processing, then will it be
additional overhead caused by starting them again for the next polling
period?
The cost of the starting and stopping is very low, unless you are
working in a very high throughput environment I doubt you would even
notice the overhead.
I suggest you join the Coursera course and watch the last three weeks of
lectures. Roland covers pretty much all of this (and significantly
better than my descriptions).
Cheers
Rory
Regards,
Sridhar.
On Thursday, December 19, 2013 6:25:15 PM UTC-8, Sridhar Sreenivasan
wrote:
Hi,
Iam new to Akka, and have been designing an application in
akka(2.2.3)/scala. Gist of the application is to poll for files
from external source every minute, process the file (read) and
send it for downstream processing.
I have a master actor, that creates region pollers (the external
source can be located in multiple geo regions). The region pollers
then polls for files in the specific region and creates child
actors that processes the files.
Current mechanism-
1. MasterPoller created and scheduled periodically, by sending a
message in a Main
2. The MasterPoller actor creates the RegionPoller actor upon
receiving the message
3. The RegionPoller creates the FileWorker actors upon receiving
the message
4. FileWorker processes the files.
First question, is this an acceptable pattern of creating actors
within the "receive" method. Or should it be created as part of
the "preStart" of each actor. Should I terminate the actors in
step 3. and 2. after processing? Or if I don't, will they be
reused when the MasterPoller creates the actors again? I fear that
with the implementation I currently have, the number of actors are
going to be incremented with no explicit "stop". But what's the
performance impact of stopping the actors, and recreating them?
Secondly if I create the actors as part of "prestart" then they
will be reused. But in the case where FileWorker is processing,
and then the MasterPoller is scheduled for the next run, and sends
a message to RegionPoller. When RegionPoller sends the message to
FileWorker, it's going to be blocked, right?
Any suggestions on what the best practices are?
Regards,
Sridhar.
--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://akka.io/faq/
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google
Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/groups/opt_out.
--
Read the docs: http://akka.io/docs/
Check the FAQ: http://akka.io/faq/
Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/groups/opt_out.