Hello fellow hAkkers,

we have multiple persistent actor types distributed using cluster sharding. 
Some of them logically belong together, lets say they're customers and 
their orders. Customers never talk to orders of other customers, and vice 
versa. Thus it makes sense to us to have these actors reside on the same 
cluster shard (and consequently, in the same VM). 

We implemented this by returning identical ShardIds for the customer c123 
and its orders c123-o0, c123-o1, etc. But of course, this doesn't work like 
we thought it would. :) The ShardResolvers of two instances of ShardRegion 
operate independently, and we just end up with two shards -- one for 
customers and one for orders -- which share a name but not necessarily a 
cluster host. I have seen this misunderstanding crop up a few times before 
on this list, which makes it slightly less embarrasing to admit the 
mistake. ;)

We could stop using cluster sharding for the orders completely, and instead 
route all messages for the orders through the customers, which would 
restart the actors on demand. But that sounds like a lot of extraneous 
code: many other actors talk to the orders[0], and the customers shouldn't 
need to route these messages or worry about them, the customer actors need 
not even be alive for them. And we'd also have to worry about the other 
things that cluster sharding does: support for passivation of orders, 
gracefully handling rebalances of customers (killing all order actors when 
it happens, I guess), maybe other things.

[0] I realize that this will lead to the question: if many other actors 
talk to the orders without involving the customers, why do you want them on 
the same host? Lets just assume for the sake of argument that circumstances 
make this a reasonable requirement, unless you're saying it's not a 
reasonable requirement under any circumstances.

The alternative would involve writing a custom ShardAllocationStrategy 
that's shared among the customer and order ShardRegions. I suppose it would 
involve the following:
 - maintain the associations between ShardRegion actorRef and ShardIds for 
each entity type;
 1. for a new requested allocation for entity type X:
 2. check if the same shardId is already allocated for any other entity 
type Y, yielding (at least one) associated shardRegionActorRefY
 3. if so, determine if there is any shardRegionActorX for entity type X 
that's on the same host as shardRegionActorRefY
 4. if so, allocate the shardId to shardRegionActorX (ie. return it; 
optionally balance between several candidates)
 5. otherwise, fallback to any other ShardAllocationStrategy (updating the 
associations based on its return value)

Eugh. I feel dirty now. Apart from the general horrificness, I imagine step 
3 is fraught with peril. And of course, the whole thing would need to be 
thread-safe because it will be accessed and modified concurrently by 
several ShardRegions. (Time to dust off ye olde ConcurrentHashMap.) The 
more I look at it, the more fragile and less feasible it seems.

At the same time, having this sort of control over the clustering of 
several entity types does not seem particularly outrageous. Are we missing 
something?

Thanks as always for your thoughts,
Moritz

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Reply via email to