On 20/12/2013 18:38, Sridhar Sreenivasan wrote:
Hi Rory,
Thanks for the response. It clarifies some questions. But I have more based on your response-

Hi Sridhar,

You are welcome, here goes.
"it would be easy to pause the poller if it is getting too far ahead of the workers. "
How can this be accomplished?

MasterPoller (MP) periodically sending scan messages to RegionalPollers (RP).

Each RP will then generate one or more queries against some remote resource (e.g. ftp server) and generates a list of files to process. I would do this with a future (risky operation that might fail). The future sends a FileList(files) back to the RP. The RP can then add the work to a queue which is given out to an available actor. Idle actors are cheap and having a fixed size pool allows you to control the maximum paralization (to avoid starvation elsewhere). Last time I did something similar I created a fixed set of children - each would request work from the RP (being put into an available workers queue if there is no work).

This means you have to manage two list - the list of outstanding work and the list of free children, and you can easily make decisions over what to do when, for example, the work queue gets too large (you are too far behind), e.g. you might want to drop the older items (or reject the new request from the MP with a notification.

The other option is that the RP generates a future for each work task - and the future sends the result back to the RP when its complete.

I understand that the actors created as part of another actor will be children. But question is what will be the best pattern to create the actors?
1. In "preStart"-
If we create N instances of actors (using SmallestMailBoxRouter), then for the duration of the actor system, there can be upto N instances, right? So for every polling period, the number of instances of the worker actor will be upto 'N'.

I personally have avoided the routers as I have wanted more explicit control (e.g. dropping) when the message backlog gets to great. If your RP generates millions of files you going to generate huge mailboxes if the workers do not keep up.
2. In "receive"
Creating N instances of actors per message received. So for every polling period, we are going to be incrementing the number of actors.

You can shut down the actors when they have finished.
If I stop the worker actor after the file processing, then will it be additional overhead caused by starting them again for the next polling period?

The cost of the starting and stopping is very low, unless you are working in a very high throughput environment I doubt you would even notice the overhead.


I suggest you join the Coursera course and watch the last three weeks of lectures. Roland covers pretty much all of this (and significantly better than my descriptions).

Cheers

Rory

Regards,
Sridhar.

On Thursday, December 19, 2013 6:25:15 PM UTC-8, Sridhar Sreenivasan wrote:

    Hi,
        Iam new to Akka, and have been designing an application in
    akka(2.2.3)/scala. Gist of the application is to poll for files
    from external source every minute, process the file (read) and
    send it for downstream processing.
    I have a master actor, that creates region pollers (the external
    source can be located in multiple geo regions). The region pollers
    then polls for files in the specific region and creates child
    actors that processes the files.
    Current mechanism-
    1. MasterPoller created and scheduled periodically, by sending a
    message in a Main
    2. The MasterPoller actor creates the RegionPoller actor upon
    receiving the message
    3. The RegionPoller creates the FileWorker actors upon receiving
    the message
    4. FileWorker processes the files.

    First question, is this an acceptable pattern of creating actors
    within the "receive" method. Or should it be created as part of
    the "preStart" of each actor. Should I terminate the actors in
    step 3. and 2. after processing? Or if I don't, will they be
    reused when the MasterPoller creates the actors again? I fear that
    with the implementation I currently have, the number of actors are
    going to be incremented with no explicit "stop". But what's the
    performance impact of stopping the actors, and recreating them?
    Secondly if I create the actors as part of "prestart" then they
    will be reused. But in the case where FileWorker is processing,
    and then the MasterPoller is scheduled for the next run, and sends
    a message to RegionPoller. When RegionPoller sends the message to
    FileWorker, it's going to be blocked, right?
    Any suggestions on what the best practices are?

    Regards,
    Sridhar.

--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://akka.io/faq/
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/groups/opt_out.

--
     Read the docs: http://akka.io/docs/
     Check the FAQ: http://akka.io/faq/
     Search the archives: https://groups.google.com/group/akka-user
--- You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to