[akka-user] Query strategies for akka-persistence

Moritz Schallaböck Thu, 23 Oct 2014 08:29:20 -0700

Hello fellow hakkers,

we're developing a proof-of-concept application to evaluate akka and 
akka-persistence, and we've stumbled upon a maze of twisty little passages, 
all alike. ;) We've got the command/storage side of the CQRS paradigm 
implemented, nicely sharded with akka-cluster, and are now worrying about 
dealing with the query side in an efficient manner. Specifically, we're 
wondering how to respond to "complex" queries that are predicated on the 
state wrapped by the actors. We've found several approaches that could 
conceivably work.


Let's say we have a web shop with a single aggregate root
 *Item(val persistenceId: String, val name: String, val kind: SomeEnum, var 
amountAvailable: Integer, var lastSold: Date)*

Given its persistenceId, we can easily talk to the actor representing an 
item by messaging the shard region which will forward the message, waking 
up the actor if necessary. Let's say the actor responds to some query 
message with MyState(name, kind, amountAvailable).

Now we need to be able to respond to queries such as A) "give me all 
items", B) "give me all fruits" (kind==Fruit), C) "give me all sold out 
items" (amountAvailable==0), D) "give me all items sold in February and 
March". Lets say it's sufficient to get a list of persistenceIds as a 
result to these queries. A is the "trivial" catch-all, B refers to actor 
state that is immutable, C and D refer to dynamically changing actor state. 
B partitions the entire set based on a simple predicate. The same is true 
for C if you only care about sold out/not sold out. D represents an index 
over the set of Items.

*Approach #1: Do it all in akka and akka-persistence, naively*

 A) Have a second kind of persistent actor that can respond with all 
assigned persistent ids: We get this "for free" from the way we assign 
persistence ids in the first place, increment an long id maxAssigned when 
creating a new Item. The actor responds with a List(0, 1, ..., maxAssigned).
 B) Get ALL persistenceIds using A) and send a query for the state to each 
Item, filter based on the kind.
 C) and D) The same trivial solution as in B) would work here, too.

Downside: Each time we query, we might wake up ALL existing Items, 
replaying their journal. This is probably not a good idea even if all Items 
fit into memory. Each time we query we also message all actors and process 
all replies. That sounds like a lot of work, even if it's just two lines of 
Scala code!

*Approach #2: Do it all in akka/akka-persistence but re-implement half of a 
database on top of it*

 A) Same as above.
 B) Have several persistent KindActors(val kind: SomeEnum) that are 
messaged whenever a new Item of their kind is created, and store all 
persistenceIds in a list. Given a way to get the KindActor for fruit, you 
can easily get the persistenceIds of fruits.
 C) Have a persistent SoldOutActor that are messaged whenever the 
amountAvailable of an Item changes and remove or add the item to a set of 
sold out items. 
 D) Have a persistent LastSoldActor that is messaged whenever lastSold 
changes and manage a tuple (persistenceId, lastSold) in a search tree . 
This actor can efficiently respond to queries (from, to) with all 
persistenceIds where from<lastSold<to.

Downside: A new persistent actor for every kind of query. A flurry of 
messages for every write to update the various views and indices. Annoying 
and expensive to bootstrap if you introduce new queries to an existing 
system. Events (in event-sourcing parlance) are stored multiple times: e.g. 
ItemSold is stored in Item to update lastSold and in LastSoldActor to 
update the index. Overall still feels fairly expensive in terms of 
resources (memory, storage, cpu time, programmer time).

This approach is basically where we're at now. Instead of having one actor 
each, you could have a single "parent" actor that stores a list to all 
persistenceIds as well as various predicated or sorted indices to them. 
It's conceivable to get around the persistence requirement for the query 
actors (and some downsides along with it) by essentially recreating the 
partitions, views and indices from scratch every time, but that's probably 
prohibitively expensive. And it seems as if you're basically 
re-implementing a database on top of akka-persistence. The improvements of 
the query side of akka-persistence may make this approach less 
work-intensive. And maybe there is another library that can remove some of 
the workload.

*Approach #3: Replicate the data into a second database to serve the views*

All state updates are replicated (asynchronously) to a second database. The 
actors and their journal remains the definite record, and transactions rely 
on those to prevent e.g. selling an item that's out of stock. E.g. queries 
B, C and D are trivially implemented in SQL- and NoSQL-DBs, you can easily 
add indices whenever you want. Also solves the current (but eventually to 
be solved) lack of views on multiple persistent actors.

Downside: Also duplicates all data. Seems like a sort-of big hammer to 
solve the problem. Need to update the schema in multiple places when you 
add or remove attributes. Unless you use the same DB as persistence journal 
and as a view DB, you've now got two distributed DBs to administrate. I'm 
not sure if there are more downsides?

*Approach #4: Transform the event journals themselves to serve the views*

Obviously the persistent actors append their events into an existing DB. So 
the DB has -- in theory -- all the requisite information to answer the 
queries A-D. You'd need to tell it how to interpret the events, though, and 
how the queries relate to them. And do it all in an efficient manner. 
Eventstore projections may be able to do some or all of this.

Downside: Need to write and maintain code in another language -- possibly 
Javascript. Either the code will be almost a duplicate of the code in you 
persistent actors: interpret an individual item actors event to get a 
(virtual?) representation of the actor's current state, which can be 
queried. But probably it will be wildly different: e.g. interpret all item 
actors' events to be able to answer specifically query C.

Further reading:
 Reactive DDD with Akka part 3, he chooses approach 3 -- 
http://pkaczor.blogspot.de/2014/06/reactive-ddd-with-akka-projections.html
 EventStore documentation -- https://github.com/eventstore/eventstore/wiki
 Akka Persistence on the Query Side: The Conclusion -- 
https://groups.google.com/forum/#!topic/akka-user/MNDc9cVG1To

There's nothing stopping you from mixing different approaches either 
side-by-side (e.g. use a DB for one kind of query and pure akka-persistence 
for another) or on-top-of-another (e.g. use akka-persistence views to fill 
the database). And there may be more approaches (Spark? use a non-Akka, 
in-memory, read-only data structure? send it all to Amazon Mechanical Turk 
and let real humans sort it out?) -- TIMTOWTDI strikes again. 

At the moment we're not sure how to proceeed - is the above a reasonable 
way to talk about the various approaches? Are all of them viable; is there 
a sane default approach?

Cheers
Moritz

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

[akka-user] Query strategies for akka-persistence

Reply via email to