Heya team,

We hit some production troubles pertaining to clients sending very
large multi-gets. Even with otherwise reasonable cell- and row-size
limits, even with maximum multi-action sizes in place, even with QoS
and our fancy IO-based Quotas, the pressure was enough to push over a
region server or three. It got me thinking that we need some kind of
pressure gauge in the RPC layer that can protect the RS. This wouldn't
be a QoS or Quota kind of feature, it's not about fairness between
tenants, rather it's a safety mechanism, a kind of pressure valve. I
wonder if something like this already exists or maybe you know of a
ticket already filed with some existing discussion.

My napkin-sketch is something like a metric that tracks the amount of
heap size consumed by active request and response objects. When the
metrics hits a limit, we start to reject new requests with a retryable
exception. I don't know if we want the overhead of tracking this value
exactly, so maybe the value is populated only by new requests and then
we have some crude mechanism of decay. Does Netty already have
something like this? I'd say this is in lieu of an actual streaming
RPC harness, but I think even a streaming system would benefit from
such a backpressure strategy.

It occurs to me that I don't know the current state of active memory
tracking in the region server. I recall there was some work to make
memstore and blockcache resize dynamically. Maybe this new system adds
a 3rd component to the computation.

Thoughts? Ideas?

Thanks,
Nick

Reply via email to