Yes, our production traffic all uses binary protocol, even behind our
on-server proxy that we use. In fact, if you have a way to reduce syscalls
by batching responses, that would solve another huge pain we have that's of
our own doing.
*Scott Mansfield*
Product > Consumer Science Eng > EVCache > Sr. Software Eng
{
M: 352-514-9452
E: [email protected]
K: {M: mobile, E: email, K: key}
}
On Wed, Jan 25, 2017 at 11:33 AM, dormando <[email protected]> wrote:
> Okay, so it's the big rollup that gets delayed. Makes sense.
>
> You're using binary protocol for everything? That's a major focus of my
> performance annoyance right now, since every response packet is sent
> individually. I should have that switched to an option at least pretty
> soon, which should also help with the time it takes to service them.
>
> I'll test both ascii and binprot + the req_per_event option to see how bad
> this is measurably.
>
> On Wed, 25 Jan 2017, 'Scott Mansfield' via memcached wrote:
>
> > The client is the EVCache client jar: https://github.com/netflix/evcache
> > When a user calls the batch get function on the client, it will spread
> those batch gets out over many servers because it is hashing keys to
> different servers. Imagine many of
> > these batch gets happening at the same time, though, and each server's
> queue will get a bunch of gets from a bunch of different user-facing batch
> gets. It all gets intermixed.
> > These client-side read queues are rather large (10000) and might end up
> sending a batch of a few hundred keys at a time. These large batch gets are
> sent off to the servers as
> > "one" getq|getq|getq|getq|getq|getq|getq|getq|getq|getq|noop package
> and read back in that order. We are reading the responses fairly
> efficiently internally, but the batch get
> > call that the user made is waiting on the data from all of these
> separate servers to come back in order to properly respond to the user in a
> synchronous manner.
> >
> > Now on the memcached side, there's many servers all doing this same
> pattern of many large batch gets. Memcached will stop responding to that
> connection after 20 requests on the
> > same event and go serve other connections. If that happens, any
> user-facing batch call that is waiting on any getq command still waiting to
> be serviced on that connection can
> > be delayed. It doesn't normally end up causing timeouts but it does at a
> low level.
> >
> > Our timeouts for this app in particular are 5 seconds for a single
> user-facing batch get call. This client app is fine with higher latency for
> higher throughput.
> >
> > At this point we have the reqs_per_event set to a rather high 300 and it
> seems to have solved our problem. I don't think it's causing any more
> consternation (for now), but
> > having a dynamic setting would have lowered the operational complexity
> of the tuning.
> >
> >
> > Scott Mansfield
> > Product > Consumer Science Eng > EVCache > Sr. Software Eng
> > {
> > M: 352-514-9452
> > E: [email protected]
> > K: {M: mobile, E: email, K: key}
> > }
> >
> > On Wed, Jan 25, 2017 at 11:04 AM, dormando <[email protected]> wrote:
> > I guess when I say dynamic I mostly mean runttime-settable.
> Dynamic is a
> > little harder so I tend to do those as a second pass.
> >
> > You're saying your client had head-of-line blocking for unrelated
> > requests? I'm not 100% sure I follow.
> >
> > Big multiget comes in, multiget gets processed slightly slower
> than normal
> > due to other clients making requests, so requests *behind* the
> multiget
> > time out, or the multiget itself?
> >
> > How long is your timeout? :P
> >
> > I'll take a look at it as well and see about raising the limit in
> `-o
> > modern` after some performance tests. The default is from 2006.
> >
> > thanks!
> >
> > On Wed, 25 Jan 2017, 'Scott Mansfield' via memcached wrote:
> >
> > > The reqs_per_event setting was causing a client that was doing
> large batch-gets (of a few hundred keys) to see some timeouts. Since
> memcached will delay
> > responding fully until
> > > other connections are serviced and our client will wait until
> the batch is done, we see some client-side timeouts for the users of our
> client library. Our
> > solution has been to
> > > up the setting during startup, but just as a thought experiment
> I was asking if we could have done it dynamically to avoid losing data. At
> the moment there's
> > quite a lot of
> > > machinery to change the setting (deploy, copy data over with our
> cache warmer, flip traffic, tear down old boxes) and I would have rather
> left everything as is
> > and adjusted the
> > > setting on the fly until our client's problem was resolved.
> > > I'm interested in patching this specific setting to be settable,
> but having it fully dynamic in nature is not something I'd want to tackle.
> There's a natural
> > tradeoff of
> > > latency for other connections / throughput for the one that is
> currently being serviced. I'm not sure it's a good idea to dynamically
> change that. It might cause
> > unexpected
> > > behavior if one bad client sends huge requests.
> > >
> > >
> > > Scott Mansfield
> > > Product > Consumer Science Eng > EVCache > Sr. Software Eng
> > > {
> > > M: 352-514-9452
> > > E: [email protected]
> > > K: {M: mobile, E: email, K: key}
> > > }
> > >
> > > On Tue, Jan 24, 2017 at 11:53 AM, dormando <[email protected]>
> wrote:
> > > Hey,
> > >
> > > Would you mind explaining a bit how you determined the
> setting was causing
> > > an issue, and what the impact was? The default there is
> very old and might
> > > be worth a revisit (or some kind of auto-tuning) as well.
> > >
> > > I've been trending as much as possible to online
> configuration, inlcuding
> > > the actual memory limit.. You can turn the lru crawler on
> and off,
> > > automoving on and off, manually move slab pages, etc. I'm
> hoping to make
> > > the LRU algorithm itself modifyable at runtime.
> > >
> > > So yeah, I'd take a patch :)
> > >
> > > On Mon, 23 Jan 2017, 'Scott Mansfield' via memcached wrote:
> > >
> > > > There was a single setting my team was looking at today
> and wish we could have changed dynamically: the
> > > > reqs_per_event setting. Right now in order to change it
> we need to shut down the process and start it again
> > > > with a different -R parameter. I don't see a way to
> change many of the settings, though there are some that
> > > > are ad-hoc changeable through some stats commands. I was
> going to see if I could patch memcached to be able
> > > > to change the reqs_per_event setting at runtime, but
> before doing so I wanted to check to see if that's
> > > > something that would be amenable. I also didn't want to
> do something specifically for that setting if it was
> > > > going to be better to add it as a general feature.
> > > > I see some pros and cons:
> > > >
> > > > One easy pro is that you can easily change things at
> runtime to save performance while not losing all of
> > > > your data. If client request patterns change, the
> process can react.
> > > >
> > > > A con is that the startup parameters won't necessarily
> match what the process is doing, so they are no
> > > > longer going to be a useful way to determine the
> settings of memcached. Instead you would need to connect
> > > > and issue a stats settings command to read them. It also
> introduces change in places that may have
> > > > previously never seen it, e.g. the reqs_per_event
> setting is simply read at the beginning of the
> > > > drive_machine loop. It might need some kind of
> synchronization around it now instead. I don't think it
> > > > necessarily needs it on x86_64 but it might on other
> platforms which I am not familiar with.
> > > >
> > > > --
> > > >
> > > > ---
> > > > You received this message because you are subscribed to
> the Google Groups "memcached" group.
> > > > To unsubscribe from this group and stop receiving emails
> from it, send an email to
> > > > [email protected].
> > > > For more options, visit https://groups.google.com/d/
> optout.
> > > >
> > > >
> > >
> > > --
> > >
> > > ---
> > > You received this message because you are subscribed to a
> topic in the Google Groups "memcached" group.
> > > To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/memcached/C6l8aoXQO4A/unsubscribe.
> > > To unsubscribe from this group and all its topics, send an
> email to [email protected].
> > > For more options, visit https://groups.google.com/d/optout
> .
> > >
> > >
> > > --
> > >
> > > ---
> > > You received this message because you are subscribed to the
> Google Groups "memcached" group.
> > > To unsubscribe from this group and stop receiving emails from
> it, send an email to [email protected].
> > > For more options, visit https://groups.google.com/d/optout.
> > >
> > >
> >
> > --
> >
> > ---
> > You received this message because you are subscribed to a topic in
> the Google Groups "memcached" group.
> > To unsubscribe from this topic, visit https://groups.google.com/d/
> topic/memcached/C6l8aoXQO4A/unsubscribe.
> > To unsubscribe from this group and all its topics, send an email
> to [email protected].
> > For more options, visit https://groups.google.com/d/optout.
> >
> >
> > --
> >
> > ---
> > You received this message because you are subscribed to the Google
> Groups "memcached" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email to [email protected].
> > For more options, visit https://groups.google.com/d/optout.
> >
> >
>
> --
>
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "memcached" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/
> topic/memcached/C6l8aoXQO4A/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> For more options, visit https://groups.google.com/d/optout.
>
--
---
You received this message because you are subscribed to the Google Groups
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.