Re: HTRACE-215 Simplify the Sampler type - discussion

Colin P. McCabe Mon, 27 Jul 2015 12:00:46 -0700

Hi Daniel,

The problem with the "T" in Sampler<T> is that it's
application-specific.  The code for each application needs to be
modified specifically to make use of a different T.  Ideally, Samplers
should be pluggable, so that you can use any sampler with any HTraced
code.  For example, I might run a test application with sampling set
to "always" but in production, I would run with a probability sampler
with some specific sampling rate.  But you can't do that when your
sampler depends on being passed some application-specific data.
You're stuck with only samplers that can work with that specific T.

Consider a specific example: tracing Hadoop.  I'd like to be able to
turn on tracing in Hadoop just by changing a config key.  But if I'm
using a Sampler<T> with a non-trivial T, I can't do that.  I have to
tell the customer, "first apply this patch to your Hadoop code to add
the Ts, do a full build, and then put it into live production"...  The
customer won't even follow me to step #1, let alone deploying the
patched code in production.  It totally wrecks the usefulness of
HTrace if you need to rebuild your code to use it.

Another thing to think about is that we'd like to reduce the
"boilerplate code" needed to add HTrace to an application.  Ideally
the system would create the samplers you need from your
HTraceConfiguration, rather than requiring the application to create
and manage them manually.  Of course, applications should be able to
programmatically add and remove Samplers as well, but only if they
have a specific need to do that.

I think that tracing different events with different probabilities is
a nice feature.  There is a way to do that through the new API that I
think is cleaner.  You would create multiple Tracer objects (Tracer
will no longer be a singleton).  Each tracer would be configured with
ProbabilitySampler, but they would have a different sampling rate set.
For the Foo code, you would call fooTracer.newTopLevelSpan(...), for
the Bar code, you would call barTracer.newTopLevelSpan(...), and so
forth.  In the new API, spans are always created from a specific
Tracer and use the Samplers associated with that Tracer.

This is similar to having different Log objects in log4j.  Perhaps you
think the Foo system is not that interesting most of the time, so its
log level defaults to WARN.  But if you think you're having a problem
in the Foo system, you can set its log level to TRACE and then you see
all the log messages that the Foo system has.  Same thing here, except
that instead of Log objects, we have Tracer objects.  Instead of log
messages, we have trace spans.  But we still have a lot of flexibility
at runtime as a result of this.  And we don't need to recompile to
trace.

regards,
Colin

On Mon, Jul 27, 2015 at 11:33 AM, Daniel Lee <[email protected]> wrote:
> RE: https://issues.apache.org/jira/browse/HTRACE-215
>
> I was previously making use of this feature. I was using it to trace
> different types of inputs with different probabilities. It looks like
> now I'll either have move all tracing logic completely outside of
> htrace related classes and only use Always and Never sampler which
> seems weird? Why even bother with providing ProbabilitySampler when
> (rand.nextDouble() < X ? AlwaysSampler.INSTANCE :
> NeverSampler.INSTANCE) is available.
>
> Daniel

Re: HTRACE-215 Simplify the Sampler type - discussion

Reply via email to