[akka-user] Re: Sharded replication with distributed data

Igor Baltiyskiy Wed, 16 Aug 2017 02:46:51 -0700

After some thought, I could instead do "master-master" replication, because 
in the cache, each entry is versioned, and newer versions always trump 
older ones --- kind of a LWWRegistryMap with versions for timestamps. In 
that scenario, I'd have several actors for each entity, and each of them 
would be able to initiate download of data and write it to its cache and 
then send its values to the peer, which it would trivially merge. Clients 
reading the cache would specify required version, and if the actor instance 
doesn't have a value that recent, it would initiate download from the 
external source.


However, the question remains as to how merge the values after netsplit 
heals. Cluster sharding isn't recommended to be used with auto-downing, but 
seemingly for the reason that there will be several instances of entity 
actors --- with which I'm OK. The only problem is to detect when it heals 
and find the peer. I think I'd listen to cluster membership changes and try 
to ask the added node if it has this actor (by path), and if it has, 
exchange entries with it.

Do you see any obvious downsides to that approach?

Thanks,
Igor

On Wednesday, August 16, 2017 at 4:54:32 AM UTC+3, Igor Baltiyskiy wrote:
>
> Hi,
>
> I'm wondering whether "sharded replication" is possible with Akka. Let me 
> describe that in more detail.
>
> In my model, entities contain caches that are very expensive to recreate 
> from scratch (because they cache results of multiple calls to several 
> external systems). So I can't just use cluster sharding, because that would 
> result in one actor per entity, and when the node where that actor is 
> running goes down, the data is lost. On the other hand, since this data 
> still can be fetched, I don't want to persist it. What I want is to 
> replicate each cache across a few nodes in the cluster.
>
> After reading the documentation, I initially thought about "master-slave 
> replication": for each entity actor, setup a router that manages a pool of 
> worker actors that receive new values whenever the actor updates its cache. 
> Then clients of this entity would be load-balanced across these worker 
> actors. (The clients are OK with non-monotonic reads.) Whenever the 
> "master" actor fails, one of the slaves should be promoted. 
> Note that though clients are OK with non-monotonic reads, writes must be 
> monotonic: older values in the cache must not overwrite newer values. So 
> promotion would require some complex merging of slave data. 
> A similar problem is behaviour during netsplit: after the partition heals, 
> masters need to merge their caches.
> And I glossed over the promotion details, and how the clients locate the 
> actors after the master fails, etc. All in all, this case doesn't seem to 
> be handled by cluster sharding and routers out of the box.
>
> So I turned to distributed data: I might represent the cache as a CRDT and 
> that would fix the merging problems above. However, there are several new 
> problems:
> - I don't want to have all caches replicated on all nodes in the cluster. 
> Rather, I'd like something like Riak, where a particular entity is 
> replicated across n nodes, with nodes chosen according to some rule like 
> consistent hashing. 
> From the docs, I get the impression that it is possible to define more 
> than one Replicator per node:
>
> > [Replicator] communicates with other Replicator instances with the same 
> path (without address) that are running on other nodes.
>
> > Cluster Sharding is using its own Distributed Data Replicator per node 
> role. In this way you can use a subset of all nodes for some entity types 
> and another subset for other entity types.
>
> If so, I would have a Replicator per entity. Is that correct? And what are 
> the practical limits on the number of different Replicators --- per node, 
> per cluster? My estimate is that the maximum possible number of entities is 
> on the order of 100 000.
> - Do different Replicator instances gossip independently of each other, or 
> is it a node-wide activity? 
> - If my understanding is correct, I can specify which nodes will host the 
> replicas by starting Replicator actors on these nodes with the path 
> containing entity ID. How do I select the nodes? I could perhaps use a 
> cluster-aware router (e,g, ConsistentHashingGroup) to handle that: I'd have 
> to death-watch actor instances to manage this Replicator's lifecycle. Is 
> this approach good practice?
>
> Thanks
> Igor
>

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

[akka-user] Re: Sharded replication with distributed data

Reply via email to