On Aug 19, 2011, at 12:56 AM, dormando wrote:
>>>> So there's no need to hesitate if you can already do (1) today. Let's
>>>> face it, you have been very successful and there are rather a lot of
>>>> users who have already gotten past (2) :)
>>>
>>> Okay, I'm kinda tired of that argument. Just beacuse you say something
>>> isn't possible, doesn't mean we can't make it work anyway. If you believe
>>> they're divergent, stop saying that they're divergent and prove it with
>>> examples. However I'd rather spend my time writing features than
>>> pretending to know if a theoretical patch will work or not.
>>>
>>> We want to work towards a system that can encompass a replacement for
>>> "stats cachedump". If we can design something which generates sflow as a
>>> subset, that'll be totally amazing! We can even use your patches as
>>> reference for creating a core shipped plugin.
>>>
>>> If people want to use sflow today, they can apply your patches and use it.
>>> As is such with open source.
>>>
>>> -Dormando
>>
>>
>> I didn't say it wasn't possible.... but never mind all that. A
>> core-shipped plugin would be great. Let me know if there's anything I
>> can do to help.
>
> That was more strongly worded than I intended, I apologize; I don't agree
> that it's worth rushing. Not "rushing" is why we haven't already settled
> on TOPKEYS the way it is. I don't really intend to throw something else in
> there immediately.
No worries. I apologize for my impatience. You are right. There is no rush.
But you did ask for more specific examples, so for what it's worth, here are
some reasons why I think features for (1) in-production cluster-wide sampling
and (2) testing and troubleshooting should be kept as separate as possible:
A. They will rarely be used at the same time on the same node.
B. If they are used concurrently (e.g. troubleshooting a production node),
then using (2) should have no effect on (1).
C. The cluster-wide configuration used for (1) is likely to be very different
from the interactive configuration for (2).
D. Getting a feed of randomly-sampled transactions is probably the only think
they will have in common. After that, (1) will simply send the sample over
UDP, while (2) might apply regex-filtering, value-field analysis, various
tests on the expiration times and slab allocation and finally stream results
out on a TCP connection - probably using some ASCII format instead of XDR.
E. Even on the part that they do have in common (1) is likely to want only a
handful of samples per second per node (e.g. 1-in-10000), while (2) is much
more likely to want a more aggressive feed such as 1-in-10, or even 1-in-1.
It seems likely that this difference will impact the implementation. For
example, that time-duration measurement would be unthinkable at 1-in-1, but
could be quite OK at 1-in-50000.
F. Even if (2) may be considered higher priority, I think it's easier to see
how (1) can be completed and tied in a bow. I should stress here that I'm not
expecting anyone to use my code! I just think you guys could knock (1) out
pretty easily and reap immediate benefits, while (2) could take a while to
crystallize.
Getting unnecessarily detailed, let's say you implemented the plugin sampling
something like this:
possibly_sample_transaction(connection, protocol, operation, key, val, status)
{
r = next_random(connection->thread);
for(i = 0; i < num_sampling_plugins; i++) {
consumer = sampling_plugins[i];
if(r <= consumer->probability_threshold) {
(*consumer->sample_callback)(connection, protocol, operation, key,
value, status);
}
}
}
Compare that with the number of instructions and branches involved here:
possibly_sample_transaction(connection, protocol, operation, key, value,
status) {
if(next_random(connection->thread) <= probability_threshold) {
take_sample(connection, protocol, operation, key, value, status);
}
}
Or, if you allow one sampling_probability to be treated specially and turned
into a countdown-to-next-sample, then you can do it this way and save more:
possibly_sample_transaction(connection, protocol, operation, key, value,
status) {
if(unlikely(--connection->thread->countdown== 0)) {
connection->thread->countdown = compute_next_countdown();
take_sample(connection, protocol, operation, key, value, status);
}
}
At this point you could easily turn it into a macro so that there is no extra
function-call in the critical path, just a decrement-and-test on the
thread->countdown.
I don't know if it matters so much to shave a few dozen cycles off the critical
path, but my point was just to illustrate that even in the small area of
overlap between (1) and (2) you might still be grateful someday if you kept
them entirely separate.
Thoughts?
Neil