[akka-user] Scalability of Akka Distributed Data

steffen . gebert Thu, 21 Dec 2017 00:40:37 -0800

Hi,

we are investigating Akka Distributed Data for storing some (Long -> Long) 
mapping in a LWWMap.


We plan for up to ~1mio entries for this and expect a write load of few 
hundreds/thousands entries per day, in contrast we expect a read load of 
some hundred requests per second. Therefore, we plan to use readLocal 
together with writeMajority.
Roughly calculating the size of the map, we should end up with some 
20-30MB, which feels okay'ish when delta-CRDTs work.

While testing with only some thousand entries, we see huge propagation 
delays from one node to another in the order of tens of seconds (both 
running on the same PC). This raises our concern, if Akka Distributed Data 
is really a valid option for our use case (I will explain it a bit in 
detail below). What concerns us is what happens when we need to do a full 
cluster restart, when this map will fill up at a rate of like 100 entries 
per second. We currently do not feel too confident that this can work.

I already asked in the Akka-User chat and got the feedback that artery 
could help a bit to overcome the head-of-line blocking (we see the Phi 
value detector logging about higher delays). We tried that and got even 
slower updates IIRC. And the usual suspect to manually shard it into 
multiple top-level maps, which we could go for, but we would still have 100 
maps at 10k entries each.

Our use case is as follows and I will describe it using an example of a 
CarActor, which is addressed by its ID with Cluster Sharding (so that we 
have one actor per imaginary car in the whole cluster). However, we need to 
send messages to that actor also via 3 other identifiers, let's say the IDs 
of the engine, gear, and the exhaust pipe (that these parts are 
occasionally replaced and even transferred to a different car pretty well 
reflects our scenario; reads would be like every time we pass a toll 
bridge.. damn.. I think we could earn way more money if we would really 
build that very system :-)). So these three mappings (engine ID -> car ID, 
gear ID -> car ID, exhaust ID -> car ID) are what additionally needs to 
disseminated among the nodes participating in cluster sharding. The 
characteristics of CRDTs sound very appealing here -  an engine wouldn't be 
in another car in the very next second, leaving enough time for convergence 
via gossip. Having the whole data structure in memory feels like nothing to 
how we run our system currently (all nodes store all data, mostly 
successfully invalidate other node's state, backed by a heap of >100G).

It would be nice to get your feedback here, for both, if CRDTs will kill us 
one day and if you have other ideas, how to disseminate the mapping (in our 
mind we have 1) storing those in Redis accessible by all cluster sharding 
nodes and 2) having some mapping actors that know the mappings and store 
them in some local data structure; either 2a) one mapping actor per node 
knowing the whole state or 2b) one actor per engine ID, addressed via 
cluster sharding, knowing only the car ID by looking it up once in the data 
base, leading to 3mio tiny mapping actors alive ¯\_(ツ)_/¯).

Thanks
Steffen

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

[akka-user] Scalability of Akka Distributed Data

Reply via email to