Dear group,

Please share your wisdom with this humble akka first-timer on how to best 
approach the design for the following system.

IoT facilities autonomous action engine.

It observes groups of buildings. The observed data are potentially huge 
streams of sensor events (presence, temperature, indoor positioning, ...), 
streams of facility events (room booked, booking canceled, work order 
completed, ...), as well as relational (oracle) and non-relational (hbase) 
data sources which provide additional state and metadata about locations, 
and are continuously edited by end users. Based on the observations rules 
are evaluated periodically that decide on autonomous actions to take. An 
example use case is to detect when a booked room is unoccupied five minutes 
after the booking starts, notify the user (through skype and/or email) that 
their booking will be canceled automatically in another 5 minutes unless 
they respond that they explicitly want to keep it, and then cancel the 
booking if necessary. There are however many more use cases and what they 
all have in common is reasoning across time about the state of the system 
and taking sequences of actions that involve user interaction. The entire 
system is multi-tenant, in the sense that there are multiple orgs with 
their own sets of buildings, users and sensors, which are separated at the 
data storage level (separate databases, separate kafka streams), but which 
share a server cluster. The system is also 24/7, meaning that rolling 
upgrades are the norm and that reconfigurations of locations, sensor 
deployment or rules happen in near realtime.

I was considering trying to shoehorn this into spark streaming, which we 
already use, but Akka seemed to me to be the better choice to build this 
system in. I hope I'm not wrong on that level.

This is the direction I'm thinking in, but I may be way off base:

- Build actors which are responsible for bringing the external state into 
the actor system. These tap into the databases and kafka streams to keep an 
up to date state of their particular domain (location sensor measurements, 
bookings, location properties, rule settings, etc...).
- Build an actor that wraps a stateful drools session to perform the rule 
evaluation. On init it registers itself with the state actors to fetch the 
initial complete state and add it to the drools session, and then also 
receive update events to update the drools session state (although that may 
cause mayhem with state actor restarts, so I'm not quite sure how best to 
solve this). The rules are injected at actor/session creation time (from 
the separate actor which manages the rules), and a rules change means a 
restart of this actor.
- A separate actor that asks the drools actor to evaluate the rules and 
receives the results of the evaluation, all the actions that are supposed 
to be taken. It dispatches from there to a gaggle of actors which are 
responsible for various actions (make bookings, cancel bookings, send 
emails, etc...), which I'll try to keep pretty much stateless and therefore 
easy to cluster.
- Shard the stateful parts of this solution based on tenant initially, and 
later if the drools session becomes too big also shard by location 
hierarchy (different regions are different groups of actors). Run the 
different shards on different servers through some kind of automatic load 
balancing mechanism. 
- A server going down means recreating the actors on another server and 
fetching all state again prior to evaluating rules. This could easily take 
a minute or two, but given that all state is persisted already somewhere it 
should be feasible to reconstruct the actor state without having persistent 
actors.

What I'm especially unclear about is the following:
1. How to load balance this whole affair on a cluster. How do I avoid that 
at the end of a rolling upgrade there is an unequal distribution of actors?
2. How do I ensure optimal communication, so that actors within a 
particular shard end up on the same server?
3. How do you monitor something like this? Is it possible to set something 
up that can give me trendlines for resource usage so I have enough lead 
time to adapt to increased load in a particular tenant?

I would be really happy to get any advice, because I'm definitely out of my 
depth on this one.

Thanks

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Reply via email to