Hi,

On Fri, Nov 28, 2014 at 11:06 AM, Erol Merdanović <[email protected]>
wrote:

> Hi
>
> We have an app (Play Framework) which accepts files from different
> devices. Devices are logging some values into a file and send that file
> every 5 minutes. The app accepts the file, parses it and processes it.
>
> Actual steps are
> 1. parse the file
> 2. recalculate values (we get values as 0-100%, so we need to go to
> database to read actually range for each value and recalculate it)
> 3. filter zero values (if each value is < X, then value = 0 where X is
> again obtained from database for each value)
> 4. calculate calculation values (we read formulas from database and create
> new calculated values from the uploaded values)
> 5. we store values to timeseries database (kairosdb)
> 6. every value belongs to different tag (tag is like pressure, temperature
> ..) so when we get a value, we activate the tag (update in database)
> 7. store last value for each tag (again, make an update in the database)
> 8. update device status (we update the last uploaded time for each device
> in the database)
> 9. check for alarms (for each alarm we fetch avg value from timeseries
> database and if it's >, < or == to alarm value, activate alarm and
> insert/update alarm log into database)
>
> To do all of this, it takes around 200ms-300ms on our production machine.
>
> Problems:
> 1. Should I actually split this into separate actors? Right now it's in
> one big actor which is calling injected services.
>

Depends on how much you want to be able to run in parallel. One single
instance of this big actor is obviously not an option. One instance per
device could work.

There are a lot of database calls involved and if those are blocking calls
(e.g. jdbc) you should make sure that you use a dedicated dispatcher for
those steps to manage the blocking threads carefully and isolate them from
other non-blocking parts of the application. That can be a reason to
separate the steps into separate actors.

Another reason is that separate actors will enable pipelined execution,
e.g. one chunk of values can be calculated in step 4 while another chunk of
values are stored to the timeseries database in step 5.

Also, it can be good to split the steps into separate actors to make them
easier to test.

It doesn't sound like you need to distribute the processing steps to
several machines, but that could otherwise also be a reason to make them
separate actors.


> 2. How to make sure each actor is called after another (to have sequential
> processing)? Should each actor pass the message to another actor or is
> there some better way (supervisor/parent? )?
>

There are several ways. One way is to have a coordinator that delegates
results from one step to the next step. Then the coordinator have all the
references to the workers. Another way is to pass the references in the
messages, and each worker pass on the result to the next step.


> 3. What happens when I want to skip certain step (for example, to skip
> checking for alarms)? Should there be skipAlarms = true in the message or I
> should implement this somehow else?
> 4. Should there be any problems when I'm passing collection (in our case
> Set of values) between actors?
>

Make sure that the messages are immutable. Are you using Java? Java
collections are not immutable, so make sure that you copy them when passing
in messages, or at least that you don't modify the same collection from two
different actors.


> Right now I'm inserting around 20k files and I'm noticing big memory
> consumption so I might even have a memory leak somewhere.
>

Here is the challenge with all this. You must make sure that you don't read
in more records than you can consume, otherwise you risk out of memory. For
example, reading and parsing the file might be fast but the next step
involving the database might be slower, then you have a lot of records in
memory, in the mailbox of the Step2 actor. To solve this you need to
implement flow control.

The simplest flow control is that each step requests previous step for one
more record when it is done with a record. It does not send the result to
the next step until that has requested for a record. Everything message
based, of course. This propagates all the way up to the file reading.

If you find that the one-by-one request scheme is limiting performance you
can request records in batches, such as "I'm ready to processes 100
records".

Incidentally, we are currently working on Akka Streams that supports this
kind of backpressured stream processing. It's in preview experimental state
right now, i.e. not ready for production.

Read more about it, and try it here:
http://typesafe.com/activator/template/akka-stream-java8
http://typesafe.com/activator/template/akka-stream-scala

Regards,
Patrik


>
> One of the ideas that I had was to create one parent actor for each
> device. The parent actor passes the message to child actor (9 steps) and
> waits for an answer. When it gets it, pass the message to another step.
> There I could then have some kind of logic to skip certain steps.
>
> For any suggestions and comments I would be really grateful.
>
>  --
> >>>>>>>>>> Read the docs: http://akka.io/docs/
> >>>>>>>>>> Check the FAQ:
> http://doc.akka.io/docs/akka/current/additional/faq.html
> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
> ---
> You received this message because you are subscribed to the Google Groups
> "Akka User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/akka-user.
> For more options, visit https://groups.google.com/d/optout.
>



-- 

Patrik Nordwall
Typesafe <http://typesafe.com/> -  Reactive apps on the JVM
Twitter: @patriknw

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Reply via email to