> Well, sounds like whatever process was asking for that data is dead (and > possibly pissing off a customer) so you should indeed figure out what > that's about.
Yeah, we’ll definitely hunt this one down. I’ll have to toss up a monitor to look for things in a write state for extended periods and then go do some tracing (rather than, say, waiting for it to actually break again). We *do* have some legitimately long-running (multi-hour) things going on, so can’t just say “long connection bad!”, but it would be nice if maybe those processes could slurp their entire response upfront or some such. > I think another thing we can do is actually throw a refcounted-for-a-long-time > item back to the front of the LRU. I'll try a patch for that this weekend. It should > have no real overhead compared to other approaches of timing out connections. Is there any reason you can’t do “if refcount > 1 when walking the end of the tail, send to the front” without requiring ‘refcounted for a long time’ (with, of course, still limiting it to 5ish actions)? It seems like this would be pretty safe, since generally stuff at the end of LRU shouldn’t have a refcount, and then you don’t need extra code for figuring out how long something has been refcounted. I guess there’s a slightly degenerate case in there, which is that if you have a small slab that’s 100% refcounted, you end up cycling a bunch of pointers every write just to run the LRU in a big circle and never write anything (similar to the case you suggest in your last paragraph), but that’s a situation I’m totally willing to accept. ;) Anyhow, looking forward to a patch, and will gladly help test! -j -- --- You received this message because you are subscribed to the Google Groups "memcached" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
