Hello fellow hakkers, we're developing a proof-of-concept application to evaluate akka and akka-persistence, and we've stumbled upon a maze of twisty little passages, all alike. ;) We've got the command/storage side of the CQRS paradigm implemented, nicely sharded with akka-cluster, and are now worrying about dealing with the query side in an efficient manner. Specifically, we're wondering how to respond to "complex" queries that are predicated on the state wrapped by the actors. We've found several approaches that could conceivably work.
Let's say we have a web shop with a single aggregate root *Item(val persistenceId: String, val name: String, val kind: SomeEnum, var amountAvailable: Integer, var lastSold: Date)* Given its persistenceId, we can easily talk to the actor representing an item by messaging the shard region which will forward the message, waking up the actor if necessary. Let's say the actor responds to some query message with MyState(name, kind, amountAvailable). Now we need to be able to respond to queries such as A) "give me all items", B) "give me all fruits" (kind==Fruit), C) "give me all sold out items" (amountAvailable==0), D) "give me all items sold in February and March". Lets say it's sufficient to get a list of persistenceIds as a result to these queries. A is the "trivial" catch-all, B refers to actor state that is immutable, C and D refer to dynamically changing actor state. B partitions the entire set based on a simple predicate. The same is true for C if you only care about sold out/not sold out. D represents an index over the set of Items. *Approach #1: Do it all in akka and akka-persistence, naively* A) Have a second kind of persistent actor that can respond with all assigned persistent ids: We get this "for free" from the way we assign persistence ids in the first place, increment an long id maxAssigned when creating a new Item. The actor responds with a List(0, 1, ..., maxAssigned). B) Get ALL persistenceIds using A) and send a query for the state to each Item, filter based on the kind. C) and D) The same trivial solution as in B) would work here, too. Downside: Each time we query, we might wake up ALL existing Items, replaying their journal. This is probably not a good idea even if all Items fit into memory. Each time we query we also message all actors and process all replies. That sounds like a lot of work, even if it's just two lines of Scala code! *Approach #2: Do it all in akka/akka-persistence but re-implement half of a database on top of it* A) Same as above. B) Have several persistent KindActors(val kind: SomeEnum) that are messaged whenever a new Item of their kind is created, and store all persistenceIds in a list. Given a way to get the KindActor for fruit, you can easily get the persistenceIds of fruits. C) Have a persistent SoldOutActor that are messaged whenever the amountAvailable of an Item changes and remove or add the item to a set of sold out items. D) Have a persistent LastSoldActor that is messaged whenever lastSold changes and manage a tuple (persistenceId, lastSold) in a search tree . This actor can efficiently respond to queries (from, to) with all persistenceIds where from<lastSold<to. Downside: A new persistent actor for every kind of query. A flurry of messages for every write to update the various views and indices. Annoying and expensive to bootstrap if you introduce new queries to an existing system. Events (in event-sourcing parlance) are stored multiple times: e.g. ItemSold is stored in Item to update lastSold and in LastSoldActor to update the index. Overall still feels fairly expensive in terms of resources (memory, storage, cpu time, programmer time). This approach is basically where we're at now. Instead of having one actor each, you could have a single "parent" actor that stores a list to all persistenceIds as well as various predicated or sorted indices to them. It's conceivable to get around the persistence requirement for the query actors (and some downsides along with it) by essentially recreating the partitions, views and indices from scratch every time, but that's probably prohibitively expensive. And it seems as if you're basically re-implementing a database on top of akka-persistence. The improvements of the query side of akka-persistence may make this approach less work-intensive. And maybe there is another library that can remove some of the workload. *Approach #3: Replicate the data into a second database to serve the views* All state updates are replicated (asynchronously) to a second database. The actors and their journal remains the definite record, and transactions rely on those to prevent e.g. selling an item that's out of stock. E.g. queries B, C and D are trivially implemented in SQL- and NoSQL-DBs, you can easily add indices whenever you want. Also solves the current (but eventually to be solved) lack of views on multiple persistent actors. Downside: Also duplicates all data. Seems like a sort-of big hammer to solve the problem. Need to update the schema in multiple places when you add or remove attributes. Unless you use the same DB as persistence journal and as a view DB, you've now got two distributed DBs to administrate. I'm not sure if there are more downsides? *Approach #4: Transform the event journals themselves to serve the views* Obviously the persistent actors append their events into an existing DB. So the DB has -- in theory -- all the requisite information to answer the queries A-D. You'd need to tell it how to interpret the events, though, and how the queries relate to them. And do it all in an efficient manner. Eventstore projections may be able to do some or all of this. Downside: Need to write and maintain code in another language -- possibly Javascript. Either the code will be almost a duplicate of the code in you persistent actors: interpret an individual item actors event to get a (virtual?) representation of the actor's current state, which can be queried. But probably it will be wildly different: e.g. interpret all item actors' events to be able to answer specifically query C. Further reading: Reactive DDD with Akka part 3, he chooses approach 3 -- http://pkaczor.blogspot.de/2014/06/reactive-ddd-with-akka-projections.html EventStore documentation -- https://github.com/eventstore/eventstore/wiki Akka Persistence on the Query Side: The Conclusion -- https://groups.google.com/forum/#!topic/akka-user/MNDc9cVG1To There's nothing stopping you from mixing different approaches either side-by-side (e.g. use a DB for one kind of query and pure akka-persistence for another) or on-top-of-another (e.g. use akka-persistence views to fill the database). And there may be more approaches (Spark? use a non-Akka, in-memory, read-only data structure? send it all to Amazon Mechanical Turk and let real humans sort it out?) -- TIMTOWTDI strikes again. At the moment we're not sure how to proceeed - is the above a reasonable way to talk about the various approaches? Are all of them viable; is there a sane default approach? Cheers Moritz -- >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>>>>>> Check the FAQ: >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups "Akka User List" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout.
