Dear group, Please share your wisdom with this humble akka first-timer on how to best approach the design for the following system.
IoT facilities autonomous action engine. It observes groups of buildings. The observed data are potentially huge streams of sensor events (presence, temperature, indoor positioning, ...), streams of facility events (room booked, booking canceled, work order completed, ...), as well as relational (oracle) and non-relational (hbase) data sources which provide additional state and metadata about locations, and are continuously edited by end users. Based on the observations rules are evaluated periodically that decide on autonomous actions to take. An example use case is to detect when a booked room is unoccupied five minutes after the booking starts, notify the user (through skype and/or email) that their booking will be canceled automatically in another 5 minutes unless they respond that they explicitly want to keep it, and then cancel the booking if necessary. There are however many more use cases and what they all have in common is reasoning across time about the state of the system and taking sequences of actions that involve user interaction. The entire system is multi-tenant, in the sense that there are multiple orgs with their own sets of buildings, users and sensors, which are separated at the data storage level (separate databases, separate kafka streams), but which share a server cluster. The system is also 24/7, meaning that rolling upgrades are the norm and that reconfigurations of locations, sensor deployment or rules happen in near realtime. I was considering trying to shoehorn this into spark streaming, which we already use, but Akka seemed to me to be the better choice to build this system in. I hope I'm not wrong on that level. This is the direction I'm thinking in, but I may be way off base: - Build actors which are responsible for bringing the external state into the actor system. These tap into the databases and kafka streams to keep an up to date state of their particular domain (location sensor measurements, bookings, location properties, rule settings, etc...). - Build an actor that wraps a stateful drools session to perform the rule evaluation. On init it registers itself with the state actors to fetch the initial complete state and add it to the drools session, and then also receive update events to update the drools session state (although that may cause mayhem with state actor restarts, so I'm not quite sure how best to solve this). The rules are injected at actor/session creation time (from the separate actor which manages the rules), and a rules change means a restart of this actor. - A separate actor that asks the drools actor to evaluate the rules and receives the results of the evaluation, all the actions that are supposed to be taken. It dispatches from there to a gaggle of actors which are responsible for various actions (make bookings, cancel bookings, send emails, etc...), which I'll try to keep pretty much stateless and therefore easy to cluster. - Shard the stateful parts of this solution based on tenant initially, and later if the drools session becomes too big also shard by location hierarchy (different regions are different groups of actors). Run the different shards on different servers through some kind of automatic load balancing mechanism. - A server going down means recreating the actors on another server and fetching all state again prior to evaluating rules. This could easily take a minute or two, but given that all state is persisted already somewhere it should be feasible to reconstruct the actor state without having persistent actors. What I'm especially unclear about is the following: 1. How to load balance this whole affair on a cluster. How do I avoid that at the end of a rolling upgrade there is an unequal distribution of actors? 2. How do I ensure optimal communication, so that actors within a particular shard end up on the same server? 3. How do you monitor something like this? Is it possible to set something up that can give me trendlines for resource usage so I have enough lead time to adapt to increased load in a particular tenant? I would be really happy to get any advice, because I'm definitely out of my depth on this one. Thanks -- >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>>>>>> Check the FAQ: >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups "Akka User List" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout.
