On Sun, Apr 23, 2017 at 5:32 PM, Curt Siffert <[email protected]> wrote:
> > On Apr 22, 2017, at 9:12 AM, Justin du coeur <[email protected]> wrote: > > Personally, I'd likely use Akka Cluster Sharding for this -- assuming > each request can be summarized in a way that fits as an entity identifier, > it seems like a good fit. In this model, the sharded entity would serve as > a read-through cache for the external service: your stream would go to the > sharded Actor -- if it's the first request, the Actor would make the > external request, otherwise it would already know about the value and > return it. Cluster Sharded Actors are a lot like an in-memory cache, and I > use them for exactly this in a number of cases. > > > > I'd probably only bother with Akka Persistence (I assume you're thinking > of building this on top of Redis) if the number of distinct request types > is too large to typically fit in memory (so the Sharded Actors would need > to sometimes get flushed), and too expensive/slow to want to ever re-run a > request externally. Might be worthwhile, might not; it depends on the > external constraints. > > That’s good to hear. I have some experience setting up an Akka Cluster in > AWS before, but haven’t used cluster sharding. > It's my favorite part of the Akka stack, surprisingly easy to use once you get the idea. > We might be able to get away without using Akka Persistence - we’re still > working out those external constraints. Once “cached” (instantiated), the > value should remain valid for quite a while (source data doesn’t change > often), and the external lookup would be against another internal/nearby > low-latency service anyway. > In which case I might not bother with Akka Persistence -- it just adds another external service without a ton of benefit, if re-querying the external service is adequately fast/cheap. (That said, if you aggregate the shards as described below, Akka Persistence provides a relatively easy way to save snapshots of each aggregate, and quickly restore them if and when shards get rebalanced.) If by distinct request types you mean number of values cached, we’re > probably in the tens of millions, or perhaps hundreds of millions in time. > It’s only one type of request though, two parameters that will return back > a small data structure. We haven’t sized the data structure yet but it is > roughly similar to an Address object - a flat structure with a handful of > string fields. > Yeah -- in that case I suspect that Distributed Data is a non-starter. (Although, mind, I haven't done a lot with it myself yet, so I'm speaking from my rough knowledge of this one.) > Since it’s a cluster that new instances can join, is memory effectively > unbounded? Pretty much, yeah -- my recollection is that they've tested clusters up to somewhere in the ballpark of a thousand servers, so you can get a *lot* of memory in there. That’s an actor instance for each value we’d otherwise cache, so tens or > hundreds of millions of actor instances. I know akka scales out well but I > have not seen much comparing it to the performance of something like Redis. Don't know -- my understanding is that Akka clustering scales well, but I don't know Redis at all well, so I don't know how the latency compares. > I wonder at what point (if ever) an actor-per-cached-value doesn’t make > sense anymore, if ever, compared to an external cache? I wonder if this > also impacts what to pick for “state-store-mode”. AFAIK, the only constraint on dynamically growing it is that you need to pre-declare the *total* number of shards, so if you expect growth I would build planned growth into the entity-to-shard multiplier. (The rough rule of thumb is that 10 shards per node is good, but AFAIK there's nothing wrong with starting with a larger number of shards/node, and letting those gradually re-allocate as you add nodes and rebalance the shards.) I'd actually worry more about the basic latency -- Redis is more tuned for this sort of thing, so it might be significantly faster. But I would expect that, if Cluster Sharding performance is reasonable to start with, it would scale well. Given how small the data structure is, I'd likely look for a way to sensibly aggregate them -- while Actors are pretty cheap, they're a good order of magnitude or so larger than an Address, so Actor-per-result is pretty inefficient. So I'd probably do some sort of "sub-sharding", looking for an algorithmic way to divide the namespace so that a given cache Actor contains a map of hundreds-to-thousands of results, so that it makes more efficient use of memory. That's how I do my caching: I basically hash the query, and divide the resulting namespace among a relatively modest number of sharded Actors. That's very easy to implement in Cluster Sharding, and reduces the overhead a lot. For example, here is my IdentityCache <https://github.com/jducoeur/Querki/blob/master/querki/scalajvm/app/querki/identity/IdentityCache.scala>, and the routing extractors can be found here <https://github.com/jducoeur/Querki/blob/master/querki/scalajvm/app/querki/identity/IdentityEcot.scala#L58-L63> and here <https://github.com/jducoeur/Querki/blob/master/querki/scalajvm/app/models/OID.scala#L49-L53> . -- >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>>>>>> Check the FAQ: >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups "Akka User List" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout.
