> Well, sounds like whatever process was asking for that data is dead (and
> possibly pissing off a customer) so you should indeed figure out what
> that's about.

Yeah, we’ll definitely hunt this one down. I’ll have to toss up a monitor
to look for things in a write state for extended periods and then go do
some tracing (rather than, say, waiting for it to actually break again). We
*do* have some legitimately long-running (multi-hour) things going on, so
can’t just say “long connection bad!”, but it would be nice if maybe those
processes could slurp their entire response upfront or some such.


> I think another thing we can do is actually throw a
refcounted-for-a-long-time
> item back to the front of the LRU. I'll try a patch for that this
weekend. It should
> have no real overhead compared to other approaches of timing out
connections.

Is there any reason you can’t do “if refcount > 1 when walking the end of
the tail, send to the front” without requiring ‘refcounted for a long time’
(with, of course, still limiting it to 5ish actions)? It seems like this
would be pretty safe, since generally stuff at the end of LRU shouldn’t
have a refcount, and then you don’t need extra code for figuring out how
long something has been refcounted.

I guess there’s a slightly degenerate case in there, which is that if you
have a small slab that’s 100% refcounted, you end up cycling a bunch of
pointers every write just to run the LRU in a big circle and never write
anything (similar to the case you suggest in your last paragraph), but
that’s a situation I’m totally willing to accept. ;)

Anyhow, looking forward to a patch, and will gladly help test!

-j

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to