Sorry for the wall of text, if you don't have the time to read all of it,
please just read and respond to the last part: *How to handle the
migration.*

Hi!

I've been investigating the migration to OpenTelemetry for some time.

If nothing else, I have managed to gain appreciation for the API of
OpenTelemetry and the separation between the instrumentation and
implementation APIs that it provides.



*The present state*

The current implementation is a super tightly coupled mess.
(Apologies to the authors, I'm sure there was no better option available at
the time)
AFAICT the intended operation is:

   - HTrace3 api is used for instrumentation, Spans are created. (There
   seems to be a lot of manual context wrangling going on, I can see why
   HTRace3 was abandoned)
   - TraceSpanReceiver is registered (from Java code) to HTracing, and gets
   tracing events, and pushes them into a queue.
   - TraceWriter is started and starts polling the queue, and periodically
   dumps them into the SYSTEM.TRACING_STATS table.
   - This is also not happening by default, as the SYSTEM.TRACING_STATS
   table never gets created by default (see below)

To make things even more confusing, the metrics system is also tied into
this.

   - We register PhoenixMetricsSink as a Metrics2 sink in the properties
   file
      - This is where SYSTEM.TRACING_STATS gets created
      - the properties file references *PhoenixTableMetricsWriter ,* but
      that does not exist, writing is done by PhoenixMetricsSink directly
   - Metrics are pushed to PhoenixMetricsSink which collects them and
   periodically flushes them into SYSTEM.TRACING_STATS
   - This is not happening by default, probably the properties files are
   not put on the classpath

I can kind of see the point, if I assume that no external distributed trace
and metrics handling services are available, then this is a good way to
collect all traces from the cluster, though using the same table for traces
and metrics is still iffy.
I didn't check if Htrace context is propagated over RPC. Older HBase
clients probably propagate the HTrace contect automatically (though I'm not
sure if htrace 3 and 4 are compatible in this regard), HBase-2.5 and later
certainly doesn't, as it expects an OpenTelemetry context.


*OpenTelemetry tracing and its challenges*
In 2023, I don't think that collecting traces and metrics in an SQL table
adds much value.
There are several projects that specialize in collecting processing and
visualizing distributed traces and/or metrics, and provide all kinds of
value added features to it (I assume dumping them in an SQL table would
also be possible)

Opentelemetry splits the tracing functionality into two parts:
- opentelemetry-api includes everything needed to create spans, add data to
them, and (auto-) propagate them, but omits everything else, including the
ability to turn tracing on/off.
- opentelemetry-sdk includes the full implementation, the code for
collecting/processing metrics, and most importantly the ability to define
samplers.

The preferred way to enable sampling seems to be adding the
opentelemetry-agent as a java agent (which includes opentelemetry-sdk), and
configuring its behaviour via environment variables or system properties.
opentelemetry-agent also includes instrumentation to several libraries,
which do not support it natively.

HBase uses opentelemetry-agent, and does not add any additional features
over what is available there.
It is the application's responsibility to provide custom samplers if it
wants to influence sampling decision (i.e. turn it on or off)

The first (head) sampling decision happens when we open the first Span.
Opentracing has the mechanisms to propagate the sampling decision (sampled
flag)
and includes a default sampler that honors the sampled flag from the
context (set in the parent).

It is possible to set up opentelemetry-agent for the HBase services so that
they honor the sampling bit set by the client, which means that we only
really have to worry about
sampling for the first Span which is opened on the client.
Unless the client application has already opened a Span, for Phoenix this
typically happens when an SQL statement is executed (though we can add
tracing elsewhere if we want to)

This is not ideal for HBase, and even less so for Phoenix, which has SQL
statements to turn tracing on and off, and to set tracing probability.
We need to somehow add custom samplers so that we can turn tracing on/off
and influence tracing frequency (probability)

I can see two solutions:
- Find an existing configurable sampler that we can add
- Write one.
It's only a few lines to add the functionality that we need to the default
samplers, and it is possible to register them as a serveice so that they
can be enabled via the standard mechanism.

The problem is that writing a Sampler means depending on the full
opentelemetry-sdk artifact instead of opentelemetry-api.
I think we can work around this by making a new maven module/artifact for
the sampler, so that we can avoid adding the full -sdk dependency to
phoenix-core.
We can then add this to phoenix-client. Tracing requires adding
opentelemetry-agent to the classpath anyway, so it never gets called unless
the SDK classes are present.


*How to handle the migration*
Supporting htrace and opentelemtry in the same repo would be a huge pain.
It's probably doable, but we'd have to shim everything and push all tracing
code to compatibility modules, and I'd prefer to avoid that.

My suggestion is that we rip out the current tracing code, including the
metrics sink and SYSTEM.TRACING_STATS support in 5.2 totally, and replace
it with opentracing, while doing nothing in 5.1.
This would possibly break tracing propagation for HBase-2.4, but I am not
sure anyone will notice that.
At CLDR, we have completely removed HTrace from Phoenix about a year ago
for CVE reasons, and I don't remember this causing any problems for anyone.

Another alternative is to simply backport all changes to 5.1, but that
would remove existing functionality in a patch release.
That would also be fine by me, but it would break some expectations
regarding versioning.

Looking forward to your input.

Istvan

Reply via email to