[
https://issues.apache.org/jira/browse/HDFS-11622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959433#comment-15959433
]
Karan Mehta commented on HDFS-11622:
------------------------------------
I understood the following use case for such a requirement [non-RPC spans and
mapping to multiple parents|
https://github.com/opentracing/specification/issues/5].
{quote}
Another example is in HBase. HBase has a write-ahead log, where it does "group
commit." In other words, if HBase gets requests A, B, and C, it does a single
write-ahead log write for all of them. The WAL writes can be time-consuming
since they involve writing to an HDFS stream, which could be slow for any
number of reasons (network, error handling, GC, etc.).
{quote}
Since requests A, B and C can be started independently, they will be assigned
different trace ID as well as span ID. The WAL write will be single for them,
having a single span for each of them, containing multiple parents pointing to
each of them. I am unclear about the use of trace ID at this point if all of
them can be easily traced via their parents span ID. Even in cases where the
trace doesn't form any DAG and is a linearly growing span, the information can
still be tracked via parent span ID.
Although we have multiple parents, they way it should work is that all of them
relate to the same span ID. Commented code for future use in the Description
suggests that all the parents will be available at the time of start of
{{dataStreamer}} span. The {{DFSPacket}} initializes the parents field when it
is dumping the data in {{dataQueue}} with the line
{{packet.addTraceParent(Tracer.getCurrentSpanId())}}, thus getting current
trace from the {{ThreadLocal}}. At this point, I feel that we can also get the
value of trace ID and add the info inside the {{DFSPacket}}. Any thoughts on
this one?
> TraceId hardcoded to 0 in DataStreamer, correlation between multiple spans is
> lost
> ----------------------------------------------------------------------------------
>
> Key: HDFS-11622
> URL: https://issues.apache.org/jira/browse/HDFS-11622
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: tracing
> Reporter: Karan Mehta
>
> In the {{run()}} method of {{DataStreamer}} class, the following code is
> written. {{parents\[0\]}} refer to the {{spanId}} of the parent span.
> {code}
> one = dataQueue.getFirst(); // regular data packet
> long parents[] = one.getTraceParents();
> if (parents.length > 0) {
> scope = Trace.startSpan("dataStreamer", new TraceInfo(0,
> parents[0]));
> // TODO: use setParents API once it's available from HTrace
> 3.2
> // scope = Trace.startSpan("dataStreamer", Sampler.ALWAYS);
> // scope.getSpan().setParents(parents);
> }
> {code}
> The {{scope}} starts a new TraceSpan with a traceId hardcoded to 0. Ideally
> it should be taken when {{currentPacket.addTraceParent(Trace.currentSpan())}}
> is invoked. This JIRA is to propose an additional long field inside the
> {{DFSPacket}} class which holds the parent {{traceId}}.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]