Re: [akka-user] Scalability of Akka Distributed Data

Patrik Nordwall Thu, 21 Dec 2017 11:17:51 -0800

Hi Steffen,

As documented Distribute Data has limitations with regards to data size.
Crucial is to split it up to many top level entries. I’d suggest you try
with 1000-10000 top level entries.


Even with delta-CRDTs it must sometimes transfer the full state, meaning
that the message size for that mustn’t be too big (<200 kB). E.g. when that
is done is when adding a new node to the cluster.

I’m slightly uncertain but I think LwwMap doesn’t support delta-CRDT at all
(ORMap.put) so it’s important to split up in many top level instead.

I suggest that you create a simplified prototype for your use case. Then we
could also try it, profile it and see if something could be improved.
Doesn’t sound impossible to handle this amount.

Yes, you should use Artery.

/Patrik



tors 21 dec. 2017 kl. 09:40 skrev <[email protected]>:

> Hi,
>
> we are investigating Akka Distributed Data for storing some (Long -> Long)
> mapping in a LWWMap.
>
> We plan for up to ~1mio entries for this and expect a write load of few
> hundreds/thousands entries per day, in contrast we expect a read load of
> some hundred requests per second. Therefore, we plan to use readLocal
> together with writeMajority.
> Roughly calculating the size of the map, we should end up with some
> 20-30MB, which feels okay'ish when delta-CRDTs work.
>
> While testing with only some thousand entries, we see huge propagation
> delays from one node to another in the order of tens of seconds (both
> running on the same PC). This raises our concern, if Akka Distributed Data
> is really a valid option for our use case (I will explain it a bit in
> detail below). What concerns us is what happens when we need to do a full
> cluster restart, when this map will fill up at a rate of like 100 entries
> per second. We currently do not feel too confident that this can work.
>
> I already asked in the Akka-User chat and got the feedback that artery
> could help a bit to overcome the head-of-line blocking (we see the Phi
> value detector logging about higher delays). We tried that and got even
> slower updates IIRC. And the usual suspect to manually shard it into
> multiple top-level maps, which we could go for, but we would still have 100
> maps at 10k entries each.
>
> Our use case is as follows and I will describe it using an example of a
> CarActor, which is addressed by its ID with Cluster Sharding (so that we
> have one actor per imaginary car in the whole cluster). However, we need to
> send messages to that actor also via 3 other identifiers, let's say the IDs
> of the engine, gear, and the exhaust pipe (that these parts are
> occasionally replaced and even transferred to a different car pretty well
> reflects our scenario; reads would be like every time we pass a toll
> bridge.. damn.. I think we could earn way more money if we would really
> build that very system :-)). So these three mappings (engine ID -> car ID,
> gear ID -> car ID, exhaust ID -> car ID) are what additionally needs to
> disseminated among the nodes participating in cluster sharding. The
> characteristics of CRDTs sound very appealing here -  an engine wouldn't be
> in another car in the very next second, leaving enough time for convergence
> via gossip. Having the whole data structure in memory feels like nothing to
> how we run our system currently (all nodes store all data, mostly
> successfully invalidate other node's state, backed by a heap of >100G).
>
> It would be nice to get your feedback here, for both, if CRDTs will kill
> us one day and if you have other ideas, how to disseminate the mapping (in
> our mind we have 1) storing those in Redis accessible by all cluster
> sharding nodes and 2) having some mapping actors that know the mappings and
> store them in some local data structure; either 2a) one mapping actor per
> node knowing the whole state or 2b) one actor per engine ID, addressed via
> cluster sharding, knowing only the car ID by looking it up once in the data
> base, leading to 3mio tiny mapping actors alive ¯\_(ツ)_/¯).
>
> Thanks
> Steffen
>
> --
> >>>>>>>>>> Read the docs: http://akka.io/docs/
> >>>>>>>>>> Check the FAQ:
> http://doc.akka.io/docs/akka/current/additional/faq.html
> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
> ---
> You received this message because you are subscribed to the Google Groups
> "Akka User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/akka-user.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Re: [akka-user] Scalability of Akka Distributed Data

Reply via email to