On Sun, Apr 23, 2017 at 5:32 PM, Curt Siffert <[email protected]> wrote:

> > On Apr 22, 2017, at 9:12 AM, Justin du coeur <[email protected]> wrote:
> > Personally, I'd likely use Akka Cluster Sharding for this -- assuming
> each request can be summarized in a way that fits as an entity identifier,
> it seems like a good fit.  In this model, the sharded entity would serve as
> a read-through cache for the external service: your stream would go to the
> sharded Actor -- if it's the first request, the Actor would make the
> external request, otherwise it would already know about the value and
> return it.  Cluster Sharded Actors are a lot like an in-memory cache, and I
> use them for exactly this in a number of cases.
> >
> > I'd probably only bother with Akka Persistence (I assume you're thinking
> of building this on top of Redis) if the number of distinct request types
> is too large to typically fit in memory (so the Sharded Actors would need
> to sometimes get flushed), and too expensive/slow to want to ever re-run a
> request externally.  Might be worthwhile, might not; it depends on the
> external constraints.
>
> That’s good to hear. I have some experience setting up an Akka Cluster in
> AWS before, but haven’t used cluster sharding.
>

It's my favorite part of the Akka stack, surprisingly easy to use once you
get the idea.


> We might be able to get away without using Akka Persistence - we’re still
> working out those external constraints. Once “cached” (instantiated), the
> value should remain valid for quite a while (source data doesn’t change
> often), and the external lookup would be against another internal/nearby
> low-latency service anyway.
>

In which case I might not bother with Akka Persistence -- it just adds
another external service without a ton of benefit, if re-querying the
external service is adequately fast/cheap.  (That said, if you aggregate
the shards as described below, Akka Persistence provides a relatively easy
way to save snapshots of each aggregate, and quickly restore them if and
when shards get rebalanced.)

If by distinct request types you mean number of values cached, we’re
> probably in the tens of millions, or perhaps hundreds of millions in time.
> It’s only one type of request though, two parameters that will return back
> a small data structure. We haven’t sized the data structure yet but it is
> roughly similar to an Address object - a flat structure with a handful of
> string fields.
>

Yeah -- in that case I suspect that Distributed Data is a non-starter.
 (Although, mind, I haven't done a lot with it myself yet, so I'm speaking
from my rough knowledge of this one.)


> Since it’s a cluster that new instances can join, is memory effectively
> unbounded?


Pretty much, yeah -- my recollection is that they've tested clusters up to
somewhere in the ballpark of a thousand servers, so you can get a *lot* of
memory in there.

That’s an actor instance for each value we’d otherwise cache, so tens or
> hundreds of millions of actor instances. I know akka scales out well but I
> have not seen much comparing it to the performance of something like Redis.


Don't know -- my understanding is that Akka clustering scales well, but I
don't know Redis at all well, so I don't know how the latency compares.


> I wonder at what point (if ever) an actor-per-cached-value doesn’t make
> sense anymore, if ever, compared to an external cache? I wonder if this
> also impacts what to pick for “state-store-mode”.


AFAIK, the only constraint on dynamically growing it is that you need to
pre-declare the *total* number of shards, so if you expect growth I would
build planned growth into the entity-to-shard multiplier.  (The rough rule
of thumb is that 10 shards per node is good, but AFAIK there's nothing
wrong with starting with a larger number of shards/node, and letting those
gradually re-allocate as you add nodes and rebalance the shards.)

I'd actually worry more about the basic latency -- Redis is more tuned for
this sort of thing, so it might be significantly faster.  But I would
expect that, if Cluster Sharding performance is reasonable to start with,
it would scale well.

Given how small the data structure is, I'd likely look for a way to
sensibly aggregate them -- while Actors are pretty cheap, they're a good
order of magnitude or so larger than an Address, so Actor-per-result is
pretty inefficient.  So I'd probably do some sort of "sub-sharding",
looking for an algorithmic way to divide the namespace so that a given
cache Actor contains a map of hundreds-to-thousands of results, so that it
makes more efficient use of memory.  That's how I do my caching: I
basically hash the query, and divide the resulting namespace among a
relatively modest number of sharded Actors.  That's very easy to implement
in Cluster Sharding, and reduces the overhead a lot.  For example, here is
my IdentityCache
<https://github.com/jducoeur/Querki/blob/master/querki/scalajvm/app/querki/identity/IdentityCache.scala>,
and the routing extractors can be found here
<https://github.com/jducoeur/Querki/blob/master/querki/scalajvm/app/querki/identity/IdentityEcot.scala#L58-L63>
and here
<https://github.com/jducoeur/Querki/blob/master/querki/scalajvm/app/models/OID.scala#L49-L53>
.

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Reply via email to