[akka-user] Akka Persistence on the Query Side: The Conclusion

Roland Kuhn Wed, 27 Aug 2014 07:35:04 -0700

Dear hakkers,

there have been several very interesting, educational and productive threads in 
the past weeks (e.g. here and here). We have taken some time to distill the 
essential problems as well as discuss the proposed solutions and below is my 
attempt at a summary. In the very likely case that I missed something, by all 
means please raise your voice. The intention for this thread is to end with a 
set of github issues for making Akka Persistence as closely aligned with 
CQRS/ES principles as we can make it.


As Greg and others have confirmed, the write-side (PersistentActor) is already 
doing a very good job, so we do not see a need to change anything at this 
point. My earlier proposal of adding specific topics as well as the discussed 
labels or tags all feel a bit wrong since they benefit only the read-side and 
should therefore not be a concern/duty of the write-side.

On the read-side we came to the conclusion that PersistentView basically does 
nearly the right thing, but it focuses on the wrong aspect: it seems most 
suited to track a single PersistentActor with some slack, but also not with 
back-pressure as a first-class citizen (it is possible to achieve it, albeit 
not trivial). What we distilled as the core functionality for a read-side actor 
is the following:

it can ask for a certain set of events
it consumes the resulting event stream on its own schedule
it can be stateful and persistent on its own

This does not preclude populating e.g. a graph database or a SQL store directly 
from the journal back-end via Spark, but we do see the need to allow Akka 
Actors to be used to implement such a projection.

Starting from the bottom up, allowing the read-side to be a PersistentActor in 
itself means that receiving Events should not require a mixin trait like 
PersistentView. The next bullet point means that the Event stream must be 
properly back-pressured, and we have a technology under development that is 
predestined for such an endeavor: Akka Streams. So the proposal is that any 
Actor can obtain the ActorRef for a given Journal and send it a request for the 
event stream it wants, and in response it will get a message containing a 
stream (i.e. Flow) of events and some meta-information to go with it.

The question that remains at this point is what exactly it means to “ask for a 
certain set of events”. In order to keep the number of abstractions minimal, 
the first use-case for this feature is the recovery of a PersistentActor. Each 
Journal will probably support different kinds of queries, but it must for this 
use-case respond to

case class QueryByPersistenceId(id: String, fromSeqNr: Long, toSeqNr: Long)

with something like

case class EventStreamOffer(metadata: Metadata, stream: Flow[PersistentMsg])

The metadata allows the recipient to correlate this offer with the 
corresponding request and it contains other information as we will see in the 
following.

Another way to ask for events was discussed as Topics or Labels or Tags in the 
previous threads, and the idea was that the generated stream of all events was 
enriched by qualifiers that allow the Journal to construct a materialized view 
(e.g. a separate queue that copies all events of a given type). This view then 
has a name that is requested from the read-side in order to e.g. have an Actor 
that keeps track of certain aspects of all persistent ShoppingCarts in a retail 
application. As I said above we think that this concern should be handled 
outside of the write-side because logically it does not belong there. Its 
closest cousin is the construction of an additional index or view within a SQL 
store, maintained by the RDBMS upon request from the DBA, but available to and 
relied upon by the read-side. We propose that this is also how this should work 
with Akka Persistence: the Journal is free to allow the configuration of 
materialized views that can be requested as event streams by name. The 
extraction of the indexing characteristics is performed by the Journal or its 
backing store, outside the scope of the Journal SPI; one example of doing it 
this way has been implemented by Martin already. We propose to access the 
auxiliary streams by something like

case class QueryKafkaTopic(name: String, fromSeqNr: Long, toSeqNr: Long)

Sequence numbers are necessary for deterministic replay/consumption. We had 
long discussions about the scalability implications, which is the reason why we 
propose to leave such queries proprietary to the Journal backend. Assuming a 
perfectly scalable (but then of course not real-time linearizable) Journal, the 
query might allow only

case class QuerySuperscalableTopic(name: String, fromTime: DateTime)

This will try to give you all events that were recorded after the given moment, 
but replay will not be deterministic, there will not be unique sequence 
numbers. These properties will be reflected in the Metadata that comes with the 
EventStreamOffer.

The last way to ask for events is to select them using an arbitrarily powerful 
query at runtime, probably with dynamic parameters so that it cannot be 
prepared or materialized while writing the log. Whether and how this is 
supported by the Journal depends on the precise back-end, and this is very much 
deliberate: we want to allow the Journal implementations to focus on different 
use-cases and offer different feature trade-offs. If a RDBMS is used, then 
things will naturally be linearized, but less scalable, for example. Document 
databases can extract a different set of features than when storing BLOBs in 
Oracle, etc. The user-facing API would be defined by each Journal 
implementation and could include

case class QueryEventStoreJS(javascriptCode: String)
case class QueryByProperty(jsonKey: String, value: String, since: DateTime)
case class QueryByType(clazz: Class[_], fromSeqNr: Long, toSeqNr: Long)
case class QueryNewStreams(fromSeqNr: Long, toSeqNr: Long)

The last one should elegantly solve the use-case of wanting to catalog which 
persistenceIds are valid in the Journal (which has been requested several times 
as well). As discussed for the SuperscalableTopic, each Journal would be free 
to decide whether it wants to implement deterministic replay, etc.

Properly modeling streams of events as Akka Streams feels like a consistent way 
forward, it also allows non-actor code to be employed for doing stream 
processing on the resulting event streams, including merging multiple of them 
or feeding events into Spark—the possibilities are boundless. I’m quite excited 
by this new perspective and look forward to your feedback on how well this 
helps Akka users implement the Q in CQRS.

Regards,


Dr. Roland Kuhn
Akka Tech Lead
Typesafe – Reactive apps on the JVM.
twitter: @rolandkuhn


-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

[akka-user] Akka Persistence on the Query Side: The Conclusion

Reply via email to