Hi Daniel, The problem with the "T" in Sampler<T> is that it's application-specific. The code for each application needs to be modified specifically to make use of a different T. Ideally, Samplers should be pluggable, so that you can use any sampler with any HTraced code. For example, I might run a test application with sampling set to "always" but in production, I would run with a probability sampler with some specific sampling rate. But you can't do that when your sampler depends on being passed some application-specific data. You're stuck with only samplers that can work with that specific T.
Consider a specific example: tracing Hadoop. I'd like to be able to turn on tracing in Hadoop just by changing a config key. But if I'm using a Sampler<T> with a non-trivial T, I can't do that. I have to tell the customer, "first apply this patch to your Hadoop code to add the Ts, do a full build, and then put it into live production"... The customer won't even follow me to step #1, let alone deploying the patched code in production. It totally wrecks the usefulness of HTrace if you need to rebuild your code to use it. Another thing to think about is that we'd like to reduce the "boilerplate code" needed to add HTrace to an application. Ideally the system would create the samplers you need from your HTraceConfiguration, rather than requiring the application to create and manage them manually. Of course, applications should be able to programmatically add and remove Samplers as well, but only if they have a specific need to do that. I think that tracing different events with different probabilities is a nice feature. There is a way to do that through the new API that I think is cleaner. You would create multiple Tracer objects (Tracer will no longer be a singleton). Each tracer would be configured with ProbabilitySampler, but they would have a different sampling rate set. For the Foo code, you would call fooTracer.newTopLevelSpan(...), for the Bar code, you would call barTracer.newTopLevelSpan(...), and so forth. In the new API, spans are always created from a specific Tracer and use the Samplers associated with that Tracer. This is similar to having different Log objects in log4j. Perhaps you think the Foo system is not that interesting most of the time, so its log level defaults to WARN. But if you think you're having a problem in the Foo system, you can set its log level to TRACE and then you see all the log messages that the Foo system has. Same thing here, except that instead of Log objects, we have Tracer objects. Instead of log messages, we have trace spans. But we still have a lot of flexibility at runtime as a result of this. And we don't need to recompile to trace. regards, Colin On Mon, Jul 27, 2015 at 11:33 AM, Daniel Lee <[email protected]> wrote: > RE: https://issues.apache.org/jira/browse/HTRACE-215 > > I was previously making use of this feature. I was using it to trace > different types of inputs with different probabilities. It looks like > now I'll either have move all tracing logic completely outside of > htrace related classes and only use Always and Never sampler which > seems weird? Why even bother with providing ProbabilitySampler when > (rand.nextDouble() < X ? AlwaysSampler.INSTANCE : > NeverSampler.INSTANCE) is available. > > Daniel
