Comment #3 on issue 202 by [email protected]: TOP_KEYS feature fixes http://code.google.com/p/memcached/issues/detail?id=202
It's worth thinking about why measurements like TOP_KEYS are important. This type of measurement is there to help improve performance. If the TOP_KEYS function kills performance by 50%, it's hard to see the justification for turning it on since the feature provides limited data and is difficult to manage in an operational setting (it can't easily be enabled, disabled or reconfigured).
It is convenient to have the measurements calculated by the server and made available through the memcache protocol, but that convenience comes at a huge cost. Shifting the analysis away from the servers means that you get a great deal more flexibility, with minimal overhead on the server. For example, in addition to reporting TOP_KEYS, you can analyze sFlow data to report on top missed keys - very helpful for improving cache hit rates. Calculating additional metrics using sFlow involves no additional work on the servers, whereas each time you add an additional metric like TOP_KEYS on the server you cut performance by an additional 50%.
In any case, it is likely that you would use an external application to analyze the performance metrics and produce charts and reports regardless of whether memcache or sFlow is used to transport the metrics.
What is the overhead of inserting an extra engine in the chain? My concern would be that the cost of adding the instrumentation as a module might be high - reducing the value of the instrumentation. The optimal location for sFlow would be in the protocol engine where the counters are updated, since the sFlow hook in the performance path essentially involves maintaining one additional counter.
