Re: [akka-user] Query strategies for akka-persistence

Patrik Nordwall Fri, 24 Oct 2014 04:17:54 -0700

Hi Moritz,

Thanks for an excellent write-up. Comments inline...


On Thu, Oct 23, 2014 at 5:09 PM, Moritz Schallaböck <
[email protected]> wrote:

> Hello fellow hakkers,
>
> we're developing a proof-of-concept application to evaluate akka and
> akka-persistence, and we've stumbled upon a maze of twisty little passages,
> all alike. ;) We've got the command/storage side of the CQRS paradigm
> implemented, nicely sharded with akka-cluster, and are now worrying about
> dealing with the query side in an efficient manner. Specifically, we're
> wondering how to respond to "complex" queries that are predicated on the
> state wrapped by the actors. We've found several approaches that could
> conceivably work.
>
> Let's say we have a web shop with a single aggregate root
>  *Item(val persistenceId: String, val name: String, val kind: SomeEnum,
> var amountAvailable: Integer, var lastSold: Date)*
>
> Given its persistenceId, we can easily talk to the actor representing an
> item by messaging the shard region which will forward the message, waking
> up the actor if necessary. Let's say the actor responds to some query
> message with MyState(name, kind, amountAvailable).
>
> Now we need to be able to respond to queries such as A) "give me all
> items", B) "give me all fruits" (kind==Fruit), C) "give me all sold out
> items" (amountAvailable==0), D) "give me all items sold in February and
> March". Lets say it's sufficient to get a list of persistenceIds as a
> result to these queries. A is the "trivial" catch-all, B refers to actor
> state that is immutable, C and D refer to dynamically changing actor state.
> B partitions the entire set based on a simple predicate. The same is true
> for C if you only care about sold out/not sold out. D represents an index
> over the set of Items.
>

Great example!


>
> *Approach #1: Do it all in akka and akka-persistence, naively*
>
>  A) Have a second kind of persistent actor that can respond with all
> assigned persistent ids: We get this "for free" from the way we assign
> persistence ids in the first place, increment an long id maxAssigned when
> creating a new Item. The actor responds with a List(0, 1, ..., maxAssigned).
>  B) Get ALL persistenceIds using A) and send a query for the state to each
> Item, filter based on the kind.
>  C) and D) The same trivial solution as in B) would work here, too.
>
> Downside: Each time we query, we might wake up ALL existing Items,
> replaying their journal. This is probably not a good idea even if all Items
> fit into memory. Each time we query we also message all actors and process
> all replies. That sounds like a lot of work, even if it's just two lines of
> Scala code!
>

Yes, this will not scale.


>
> *Approach #2: Do it all in akka/akka-persistence but re-implement half of
> a database on top of it*
>
>  A) Same as above.
>  B) Have several persistent KindActors(val kind: SomeEnum) that are
> messaged whenever a new Item of their kind is created, and store all
> persistenceIds in a list. Given a way to get the KindActor for fruit, you
> can easily get the persistenceIds of fruits.
>  C) Have a persistent SoldOutActor that are messaged whenever the
> amountAvailable of an Item changes and remove or add the item to a set of
> sold out items.
>  D) Have a persistent LastSoldActor that is messaged whenever lastSold
> changes and manage a tuple (persistenceId, lastSold) in a search tree .
> This actor can efficiently respond to queries (from, to) with all
> persistenceIds where from<lastSold<to.
>
> Downside: A new persistent actor for every kind of query. A flurry of
> messages for every write to update the various views and indices. Annoying
> and expensive to bootstrap if you introduce new queries to an existing
> system. Events (in event-sourcing parlance) are stored multiple times: e.g.
> ItemSold is stored in Item to update lastSold and in LastSoldActor to
> update the index. Overall still feels fairly expensive in terms of
> resources (memory, storage, cpu time, programmer time).
>
> This approach is basically where we're at now. Instead of having one actor
> each, you could have a single "parent" actor that stores a list to all
> persistenceIds as well as various predicated or sorted indices to them.
> It's conceivable to get around the persistence requirement for the query
> actors (and some downsides along with it) by essentially recreating the
> partitions, views and indices from scratch every time, but that's probably
> prohibitively expensive. And it seems as if you're basically
> re-implementing a database on top of akka-persistence. The improvements of
> the query side of akka-persistence may make this approach less
> work-intensive. And maybe there is another library that can remove some of
> the workload.
>

This will be complicated, and not efficient.


>
> *Approach #3: Replicate the data into a second database to serve the views*
>
> All state updates are replicated (asynchronously) to a second database.
> The actors and their journal remains the definite record, and transactions
> rely on those to prevent e.g. selling an item that's out of stock. E.g.
> queries B, C and D are trivially implemented in SQL- and NoSQL-DBs, you can
> easily add indices whenever you want. Also solves the current (but
> eventually to be solved) lack of views on multiple persistent actors.
>
> Downside: Also duplicates all data. Seems like a sort-of big hammer to
> solve the problem. Need to update the schema in multiple places when you
> add or remove attributes. Unless you use the same DB as persistence journal
> and as a view DB, you've now got two distributed DBs to administrate. I'm
> not sure if there are more downsides?
>
>
I think this is a good way of doing it. I don't see it as a downside to use
another database for the queries. Queries have very different
characteristics than the event log. With this solution you will be able to
have efficient writes AND efficient queries. The query database(s) might be
de-normalized to fit the various types of queries.


> *Approach #4: Transform the event journals themselves to serve the views*
>
> Obviously the persistent actors append their events into an existing DB.
> So the DB has -- in theory -- all the requisite information to answer the
> queries A-D. You'd need to tell it how to interpret the events, though, and
> how the queries relate to them. And do it all in an efficient manner.
> Eventstore projections may be able to do some or all of this.
>

Yes, eventstore projections are very interesting. When "Akka Persistence on
the Query Side: The Conclusion" is implemented I envision that some of this
can be implemented by an akka-persistence journal. To be efficient the
query must still run in the data store, and different backend data stores
will have different capabilities.


>
> Downside: Need to write and maintain code in another language -- possibly
> Javascript. Either the code will be almost a duplicate of the code in you
> persistent actors: interpret an individual item actors event to get a
> (virtual?) representation of the actor's current state, which can be
> queried. But probably it will be wildly different: e.g. interpret all item
> actors' events to be able to answer specifically query C.
>
> Further reading:
>  Reactive DDD with Akka part 3, he chooses approach 3 --
> http://pkaczor.blogspot.de/2014/06/reactive-ddd-with-akka-projections.html
>  EventStore documentation -- https://github.com/eventstore/eventstore/wiki
>  Akka Persistence on the Query Side: The Conclusion --
> https://groups.google.com/forum/#!topic/akka-user/MNDc9cVG1To
>
> There's nothing stopping you from mixing different approaches either
> side-by-side (e.g. use a DB for one kind of query and pure akka-persistence
> for another) or on-top-of-another (e.g. use akka-persistence views to fill
> the database). And there may be more approaches (Spark? use a non-Akka,
> in-memory, read-only data structure? send it all to Amazon Mechanical Turk
> and let real humans sort it out?) -- TIMTOWTDI strikes again.
>
> At the moment we're not sure how to proceeed - is the above a reasonable
> way to talk about the various approaches? Are all of them viable; is there
> a sane default approach?
>

I think #3 and #4 are viable. #3 is probably easier to implement today.

Cheers,
Patrik


>
> Cheers
> Moritz
>
> --
> >>>>>>>>>> Read the docs: http://akka.io/docs/
> >>>>>>>>>> Check the FAQ:
> http://doc.akka.io/docs/akka/current/additional/faq.html
> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
> ---
> You received this message because you are subscribed to the Google Groups
> "Akka User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/akka-user.
> For more options, visit https://groups.google.com/d/optout.
>



-- 

Patrik Nordwall
Typesafe <http://typesafe.com/> -  Reactive apps on the JVM
Twitter: @patriknw

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Re: [akka-user] Query strategies for akka-persistence

Reply via email to