Hi Roland,
thanks for the summary, I think that's the right direction.
In your summary, the only query command type pre-defined in
akka-persistence is QueryByPersistenceId. I'd find it useful to further
pre-define other query command types in akka-persistence to cover the
most common use cases, such as:
- QueryByStreamDeterministic(name, from, to) (as a generalization of
QueryKafkaTopic, ... and maybe also QueryByPersistenceId)
- QueryByTypeDeterministic(type, from, to)
- QueryByStream(name, fromTime)
- QueryByType(type, fromTime)
Supporting these commands would still be optional but it would give
better guidance for plugin developers which queries to support and, more
importantly, make it easier for applications to switch from one plugin
to another. Other, more specialized queries would still remain
plugin-specific such as QueryByProperty, QueryDynamic(queryString), etc ...
WDYT?
Cheers,
Martin
On 27.08.14 16:34, Roland Kuhn wrote:
Dear hakkers,
there have been several very interesting, educational and productive
threads in the past weeks (e.g. here
<https://groups.google.com/d/msg/akka-user/SL5vEVW7aTo/KfqAXAmzol0J> and
here
<https://groups.google.com/d/msg/akka-user/4kbYcwWS2OI/hpmAkxnB9D4J>).
We have taken some time to distill the essential problems as well as
discuss the proposed solutions and below is my attempt at a summary.
In the very likely case that I missed something, by all means please
raise your voice. The intention for this thread is to end with a set
of github issues for making Akka Persistence as closely aligned with
CQRS/ES principles as we can make it.
As Greg and others have confirmed, the write-side (PersistentActor) is
already doing a very good job, so we do not see a need to change
anything at this point. My earlier proposal of adding specific topics
as well as the discussed labels or tags all feel a bit wrong since
they benefit only the read-side and should therefore not be a
concern/duty of the write-side.
On the read-side we came to the conclusion that PersistentView
basically does nearly the right thing, but it focuses on the wrong
aspect: it seems most suited to track a single PersistentActor with
some slack, but also not with back-pressure as a first-class citizen
(it is possible to achieve it, albeit not trivial). What we distilled
as the core functionality for a read-side actor is the following:
* it can ask for a certain set of events
* it consumes the resulting event stream on its own schedule
* it can be stateful and persistent on its own
This does not preclude populating e.g. a graph database or a SQL store
directly from the journal back-end via Spark, but we do see the need
to allow Akka Actors to be used to implement such a projection.
Starting from the bottom up, allowing the read-side to be a
PersistentActor in itself means that receiving Events should not
require a mixin trait like PersistentView. The next bullet point means
that the Event stream must be properly back-pressured, and we have a
technology under development that is predestined for such an endeavor:
Akka Streams. So the proposal is that any Actor can obtain the
ActorRef for a given Journal and send it a request for the event
stream it wants, and in response it will get a message containing a
stream (i.e. Flow) of events and some meta-information to go with it.
The question that remains at this point is what exactly it means to
“ask for a certain set of events”. In order to keep the number of
abstractions minimal, the first use-case for this feature is the
recovery of a PersistentActor. Each Journal will probably support
different kinds of queries, but it must for this use-case respond to
case class QueryByPersistenceId(id: String, fromSeqNr: Long, toSeqNr:
Long)
with something like
case class EventStreamOffer(metadata: Metadata, stream:
Flow[PersistentMsg])
The metadata allows the recipient to correlate this offer with the
corresponding request and it contains other information as we will see
in the following.
Another way to ask for events was discussed as Topics or Labels or
Tags in the previous threads, and the idea was that the generated
stream of all events was enriched by qualifiers that allow the Journal
to construct a materialized view (e.g. a separate queue that copies
all events of a given type). This view then has a name that is
requested from the read-side in order to e.g. have an Actor that keeps
track of certain aspects of all persistent ShoppingCarts in a retail
application. As I said above we think that this concern should be
handled outside of the write-side because logically it does not belong
there. Its closest cousin is the construction of an additional index
or view within a SQL store, maintained by the RDBMS upon request from
the DBA, but available to and relied upon by the read-side. We propose
that this is also how this should work with Akka Persistence: the
Journal is free to allow the configuration of materialized views that
can be requested as event streams by name. The extraction of the
indexing characteristics is performed by the Journal or its backing
store, outside the scope of the Journal SPI; one example of doing it
this way has been implemented by Martin
<https://github.com/krasserm/akka-persistence-kafka/#user-defined-topics> already.
We propose to access the auxiliary streams by something like
case class QueryKafkaTopic(name: String, fromSeqNr: Long, toSeqNr: Long)
Sequence numbers are necessary for deterministic replay/consumption.
We had long discussions about the scalability implications, which is
the reason why we propose to leave such queries proprietary to the
Journal backend. Assuming a perfectly scalable (but then of course not
real-time linearizable) Journal, the query might allow only
case class QuerySuperscalableTopic(name: String, fromTime: DateTime)
This will try to give you all events that were recorded after the
given moment, but replay will not be deterministic, there will not be
unique sequence numbers. These properties will be reflected in the
Metadata that comes with the EventStreamOffer.
The last way to ask for events is to select them using an arbitrarily
powerful query at runtime, probably with dynamic parameters so that it
cannot be prepared or materialized while writing the log. Whether and
how this is supported by the Journal depends on the precise back-end,
and this is very much deliberate: we want to allow the Journal
implementations to focus on different use-cases and offer different
feature trade-offs. If a RDBMS is used, then things will naturally be
linearized, but less scalable, for example. Document databases can
extract a different set of features than when storing BLOBs in Oracle,
etc. The user-facing API would be defined by each Journal
implementation and could include
case class QueryEventStoreJS(javascriptCode: String)
case class QueryByProperty(jsonKey: String, value: String, since:
DateTime)
case class QueryByType(clazz: Class[_], fromSeqNr: Long, toSeqNr: Long)
case class QueryNewStreams(fromSeqNr: Long, toSeqNr: Long)
The last one should elegantly solve the use-case of wanting to catalog
which persistenceIds are valid in the Journal (which has been
requested several times as well). As discussed for the
SuperscalableTopic, each Journal would be free to decide whether it
wants to implement deterministic replay, etc.
Properly modeling streams of events as Akka Streams feels like a
consistent way forward, it also allows non-actor code to be employed
for doing stream processing on the resulting event streams, including
merging multiple of them or feeding events into Spark—the
possibilities are boundless. I’m quite excited by this new perspective
and look forward to your feedback on how well this helps Akka users
implement the Q in CQRS.
Regards,
*Dr. Roland Kuhn*
/Akka Tech Lead/
Typesafe <http://typesafe.com/> – Reactive apps on the JVM.
twitter: @rolandkuhn
<http://twitter.com/#%21/rolandkuhn>
--
Martin Krasser
blog: http://krasserm.blogspot.com
code: http://github.com/krasserm
twitter: http://twitter.com/mrt1nz
--
Read the docs: http://akka.io/docs/
Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.