[akka-user] Re: Akka Persistence on the Query Side: The Conclusion

Markus H Sun, 31 Aug 2014 06:23:48 -0700

Hi Roland,

sounds great that you are pushing for the whole CQRS story.

I'm just experimenting with akka and CQRS and have no production 
experience, but I'm thinking about the concepts since some time. So please 
take my comments with a big grain of salt and forgive me for making it 
sound somewhat like a wish list. But I think if there is a time for wishes, 
it might be now.

I feel the need to distinguish a little more between commands, querys, 
reads and writes. 
In a CQRS setup, I (conceptually) see three parts:
    (1) the Command side: mainly eventsourced PersistentActors that build 
little islands of consistency for changing the application state
    (2) the link between the Command and Query side: the possibility to 
make use of all or a subset of all events/messages written in (1) for 
building an optimized query side (3) or even for updating other islands of 
consistency on the Command side (other aggregates or bounded contexts) in 
(1)
    (3) the Query side: keeping an eventually consistent view of the 
application state in any form that is suitable for fast application queries

>From the 10000-foot view, we write to (1) and read from (3) but each of 
(1),(2) and (3) has its own reads and writes:
    (1) writes: -messages produced by the application to be persisted
        reads: -consistent read of the persisted messages for actor replay
           -stream of all messages (I think, that's what you mean by 
SuperscalableTopic)

    (2) writes: -at least conceptually: all messages of the all-messages 
stream of (1)
        reads: -different subsets of the all-messages stream that make 
sense to different parts of our application

    (3) writes: -any query-optimized form of our data that was read of some 
sub-stream of (2)
        reads: -whatever query the query-side datastore allows (SQL, 
fulltext searches, graph walks etc.)

While (1) is the current akka persistence implementation plus a way to get 
all messages, (2) is more like the Event Bus (though I would name it 
differently) in this picture from the axonframework documentation 
(http://www.axonframework.org/docs/2.0/images/detailed-architecture-overview.png).

(1) and (2) could be done by one product like what eventstore does with its 
projections or it could be different products like Cassandra for (1) and 
Kafka for (2). (3) could be anything that holds data.

Some more detail:

On (1): 
As said before, the command side is fine as it is today to put messages 
into the datastore and get them out again for persistent actors. I 
definitely would consider replaying of messages for persistent actors part 
of the command side, since the command side in itself has stronger 
consistency requirements (consistentency within an aggregate) than the 
query side.        

Additionally, as Ashley wrote, any actor should be able to push messages to 
the all-messages stream. In contrast to persistent actors I dont' see any 
need for replay here. Therefore and for other reasons (like message 
deduplication in (2)) I would like to propose adding an extra unique ID 
(UUID?) for any message handled by the command side, independent of an 
actors' persistenceId (which would be needed for replays nevertheless).

Also, I see the need to provide some guarantees for the all-messages 
stream. I would consider an ordering guarantee for messages from the same 
actor and an otherwise (at least roughly) timestamp based sorting a good 
compromise. This would also be comparable to the guarantees that akka 
provides for message sending. Ideally, the order stays the same for 
repeated reads of the all-messages stream. With the guarantees mentioned 
before, if the datastore keeps all messages of an actor on the same node, 
the all-messages stream could even be created per datastore node.

On (2): 
As mentioned above, I see the QueryByPersistenceId as part of (1) as it 
requires stronger consistency guarantees. All other QueryByWhatever are all 
about the question, how to retrieve the right subset of messages from the 
all-messages stream for the application and its domain. This of course 
differs by application and domain. Therefore I like Martin's 
QueryByStream(name, ...), where a stream is any subset of messages the 
application cares about. 

I also think it should not be up to the datastore to decide what streams to 
offer. I also can't really imagine how this should work in most datastores. 
While there might be some named index on top of JSON messages in MongoDB 
that can be served as as stream, I don't see how to create a 
stream/view/index in Key-Value stores or RDBMS where a message is probably 
persisted as byte array whithout any knowledge of the application.

To tell the datastore what streams to offer, I would consider something 
like projections in eventstore or user-defined topics in Martin's 
kafka-persistence that are driven by the application. In the simplest form 
it could be a projection function like

Message => Seq[String]

that is applied to each message of the all-messages stream, which taskes a 
message and gives back Strings of sub-stream-IDs. This could be anything 
from the type-name to the persistenceId of an actor to a property of the 
message. So it feels a little like "tagging on the read side" or more 
precisely "tagging on the way from the command to the query side". New 
messages should be added to the sub-streams as they arrive in the 
all-messages stream. This could probably be done in any datastore that can 
keep another index based on message IDs (if IDs are added on the command 
side) and is adjustable with application needs.

The interface for (2) could allow new projections that could be run against 
the all-messages stream or any sub-stream at any later time. A 
PersistentView could just track one of the projected sub-streams. 

For datastore implementations this would bring the minimum requirement to 
construct sub-streams as told by the application and serve streams by name.

On (3):
On the query side of the application I see anything, that holds a view of 
the data written on the command side and allows querying/consuming by some 
criteria. As you wrote in your post, the variety of datastores is huge and 
I'm not sure if the set of common queries is very big. I like the idea of 
ad-hoc querying you proposed as it is a much more messaging like way to 
query datastores but I see disperate sets of standardized queries per 
datastore type (graph dbs, rdbms, key-value etc.) that focus on the 
datatsore conceps (tables and rows in rdbms, nodes and edges in graph dbs - 
something like QueryById(id, table) for RDBMS etc.) and not on messages. I 
would very much like to see this kind of datastore querying but I don't 
think it's neccessary for CQRS setups with akka.

As a user of a full akka-cqrs-solution the points to provide application 
logic could be
    - which messages to persist in (1)
    - which projections to make for the application in (2)
    - which project streams to consume from (2) in order to trigger 
anything from query-side updates to creation of new commands for other 
aggregates to updating a view in someone's browser

Kind regards
Markus

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

[akka-user] Re: Akka Persistence on the Query Side: The Conclusion

Reply via email to