[akka-user] PersistenceQuery: Materialize view to datastore - When should the query stream be stopped?

Giovanni Alberto Caporaletti Sat, 02 Jul 2016 01:53:42 -0700

Talking about the pattern in the subject 
<http://doc.akka.io/docs/akka/2.4.7/scala/persistence-query.html#Materialize_view_to_Reactive_Streams_compatible_datastore>:
 I 
can't find a clean way of managing the lifecycle of persistence query 
streams that save persisted events to a read datastore (but they could do 
anything else as far as we're concerned)


The problem is: persistent entities often have a lifecycle. They are 
created, passivated (could be sharded), and they can be woken up 
again/replayed.

Let's make an example: say we are managing users, and our aggregate root is 
a "User" actor. We could have 100 million users (yay) but only 100k active 
concurrently. 
We don't want to keep everything up and running in our cluster - especially 
old, inactive users - so we passivate the instances after inactivity, or 
when they are "disabled" from the admin or when they deregister or for 
whatever reason.

Let's now say we have a persistence query that publishes for each user (via 
its persistenceId) all the events to a read datastore (i.e. 
UserEmailConfirmed, UserNameUpdated, UserLoggedIn, whatever, etc). It 
starts when the user starts, and then it goes on decoupled. How can I avoid 
keeping in memory all the streams for all the users ever started? Since the 
user has a lifecycle, the persistence query has too, when do I stop/restart 
it?

I thought of different things, most of which I don't like:

   1.  Have special events to signal the termination of the actor so that 
   when the query stream gets it, it stops. This would poison the domain model 
   with something completely irrelevant to it
   2.  Have a "manager" actor that starts the persistent entity and the 
   persistence query at the same time, watches the entity and stops the stream 
   using a killswitch when the parent terminates (after a delay, because the 
   last few events could not have bene processed yet from the decoupled query, 
   because of polling or backpressure). This is unreliable and really bad from 
   a design point of view.
   3. Tagging events and have a single stream: not feasible on a cluster 
   because we would have the same events on all nodes
   4. (Had this idea just now) Tag events with a sort of sharding id, so 
   that we have like 100 tags and 100 streams persistently running, 
   distributed over the cluster (e.g. from actors sharded using the same 
   sharding id/tag)


Since this is a use case so common that it's in the akka documentation 
<http://doc.akka.io/docs/akka/2.4.7/scala/persistence-query.html#Materialize_view_to_Reactive_Streams_compatible_datastore>,
 
what's the best practice? It would be nice to have a discussion here and 
then add the results (if any) to the documentation. For now, it just says 
how to start queries, not how/when to stop them w.r.t the lifecycle of our 
aggregate root/entity.
I think this is particularly relevant in the case of sharded persistent 
entities, or in general when we have a big number of aggregate roots (e.g. 
an "order" for an e-commerce site)


Let me know what you think!

Cheers
G

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

[akka-user] PersistenceQuery: Materialize view to datastore - When should the query stream be stopped?

Reply via email to