On Aug 9, 2011, at 1:50 PM, Dustin wrote: > > On Aug 8, 10:00 pm, neilmckee <[email protected]> wrote: > >>> Well, all the memcached operations are built on top of it... do you >>> mean specifically multiget might call into the engine multiple times >>> for a single "request"? >> >> Yes. That's one example. I think there were others where the >> memcache operation resulted in more than one engine transaction. > > Allocation is a separate engine request from linking. You can just > do whatever is sensible there, though. The binary protocol doesn't > necessarily have packet responses for many engine requests, but packet > requests map pretty well to engine requests. The text protocol has a > special-case "multiget" which behaves differently. > >> Although there are already 30+ companies and open-source projects with >> sFlow collectors I fully expect most memcached users will write their >> own collection-and-analysis tools once they can get this data! Don't >> you agree? So it's not about any one collector, it's about >> defining a useful, scalable measurement that everyone can feel >> comfortable using, even in production, even on the largest clusters. > > I don't think I've ever said anything that sounds like a > disagreement with you. I just disagree that it's impossible to build > memcached such that sFlow collection is an externally produced > plugin. I could be wrong, but I don't understand why we can't do it > with the engine interface or why we can't design another interface > that would be useful.
Well, I think we really just need a hook that announces the completion of a memcache-protocol operation, with args being: (1) the connection object (so we can read out transport, socket details, protocol...) (2) the operation (GET, SET, INCR etc.) (3) the key and key-length (4) the number of keys (usually 1, but >1 if this was part of a multi-get) (5) the value bytes (6) the status (STORED, NOT_FOUND, etc.) (7) perhaps something about the expiration deadline of the key(?) If timing data is ever interesting (which it can be in the new architecture) then we would want a start-of-memcache-operation hook too. It might be helpful if a plugin could attach variables to the connection object, and also to the thread object. I don't know if that is strictly necessary. I'm just looking at how it works today and how it avoids using locking/atomic_ops by adding fields to each of those two structures. There is also a special lockless fn to roll together the sample_pool counter in thread.c. I guess that means a threads_do(cb) iteration hook might be necessary too. We would still need access to the counters. sFlow pushes them out every n seconds (typically n=20, but it's configurable). That also means it would be good to register for a 1-second tick callback, to avoid having to run a separate thread just for that. So if the extra function calls don't hurt performance too much it might be a good way to do in the future. However I like your next suggestion better.... > >> On a positive note, it does seem like there is some consensus on the >> value of random-transaction-sampling here. But do we have agreement >> that this feed should be made available for external consumption (i.e. >> the whole cluster sends to one place that is not itself a memcached >> node), and that UDP should be used as the transport? I'd like to >> understand if we are on the same page when it comes to these broader >> architectural questions. > > I think I do agree with that. The question is whether we do that by > making an sFlow interface or a sample interface? Do you mean a hook that can be used by a plugin to receive randomly sampled transactions? That would allow you to inline the random-sampling and eliminate most of the overhead. An sFlow plugin would then just have to register for the feed; possibly sub-sample if the internal 1-in-N rate was more aggressive than the requested sFlow sampling-rate; marshall the samples into UDP datagrams, and send them to the configured destinations. I like this solution because it means the performance-critical part would be baked in by the experts and fully tested with every new release. But if you've already done the hard work, and everyone is going to want the UDP feed, then why not offer that too? I probably made it look hard with my bad coding, but all you have to do is XDR-encode it and call sendto(). > > (And why can't everyone just use dtrace?) I looked at this too, but the dtrace macros are not called with all the fields we need, and I assumed they were immutable. (Also the SFLOW_SAMPLE() macro injects rather fewer lines of code when enabled). Finally, I accept that the engine-pu branch is the focus of future development, but... any thoughts on what to do for the 1.4.* versions? Neil
