[
https://issues.apache.org/jira/browse/KUDU-2782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823553#comment-16823553
]
Andrew Wong commented on KUDU-2782:
-----------------------------------
We've chatted about this asynchronously over a doc, so thought I'd post some
notes that Mike lined up:
Probably use the [OpenTracing|https://opentracing.io/] API which is compatible
with Jaeger, DataDog, and Zipkin (the HTrace project has been retired).
* *Code:* Add fields for trace id in the RPC requests, instrument using
[opentracing-cpp|https://github.com/opentracing/opentracing-cpp] and
[opentracing-java|https://github.com/opentracing/opentracing-java] or similar.
Try to integrate any new instrumentation with the existing Kudu tracing code
(TRACE and/or TRACE_EVENT)
* *Possible progression:*
* Start off with tracing only replication between leaders and followers. Don’t
integrate with OpenTracing yet, instead just add trace metadata to the RPC
requests and/or responses and hook it up so that we just sample distributed
traces into the log file or something similar by logging the trace ids and we
can use it by manually collecting the trace messages from across the cluster
and associating the trace id. Try to make it possible to trigger sampling /
logging from any point in the event chain. We can also trigger logging using
our existing mechanism for logging slow activity.
* Next, propagate tracing from tablet server write requests and associate them
with the replication tracing.
* Next, propagate write request tracing information from clients (C++ and
Java) through to the servers.
* Instrument server write request traces with the OpenTracing C++ API.
* Instrument client libraries with OpenTracing APIs.
* Take the same steps as above for scan requests.
* Can we generically instrument tracing for the many other (simpler) RPC APIs?
Leader elections would be interesting.
*Tricky things:* How to propagate tracing in the context of batching: client
RPC batching, WAL group commit
> Implement distributed tracing support in Kudu
> ---------------------------------------------
>
> Key: KUDU-2782
> URL: https://issues.apache.org/jira/browse/KUDU-2782
> Project: Kudu
> Issue Type: Task
> Components: ops-tooling
> Reporter: Mike Percy
> Priority: Major
>
> It would be useful to implement distributed tracing support in Kudu,
> especially something like OpenTracing support that we could use with Zipkin,
> Jaeger, DataDog, etc. Particularly useful would be auto-sampled and on-demand
> traces of write RPCs since that would help us identify slow nodes or hotspots
> in the replication group and troubleshoot performance and stability issues.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)