[jira] [Commented] (HDFS-11622) TraceId hardcoded to 0 in DataStreamer, correlation between multiple spans is lost

Karan Mehta (JIRA) Thu, 06 Apr 2017 10:52:06 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-11622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959433#comment-15959433
 ]


Karan Mehta commented on HDFS-11622:
------------------------------------

I understood the following use case for such a requirement [non-RPC spans and 
mapping to multiple parents| 
https://github.com/opentracing/specification/issues/5].
{quote}
Another example is in HBase. HBase has a write-ahead log, where it does "group 
commit." In other words, if HBase gets requests A, B, and C, it does a single 
write-ahead log write for all of them. The WAL writes can be time-consuming 
since they involve writing to an HDFS stream, which could be slow for any 
number of reasons (network, error handling, GC, etc.).
{quote} 

Since requests A, B and C can be started independently, they will be assigned 
different trace ID as well as span ID. The WAL write will be single for them, 
having a single span for each of them, containing multiple parents pointing to 
each of them. I am unclear about the use of trace ID at this point if all of 
them can be easily traced via their parents span ID. Even in cases where the 
trace doesn't form any DAG and is a linearly growing span, the information can 
still be tracked via parent span ID. 

Although we have multiple parents, they way it should work is that all of them 
relate to the same span ID. Commented code for future use in the Description 
suggests that all the parents will be available at the time of start of 
{{dataStreamer}} span. The {{DFSPacket}} initializes the parents field when it 
is dumping the data in {{dataQueue}} with the line 
{{packet.addTraceParent(Tracer.getCurrentSpanId())}}, thus getting current 
trace from the {{ThreadLocal}}. At this point, I feel that we can also get the 
value of trace ID and add the info inside the {{DFSPacket}}. Any thoughts on 
this one?

> TraceId hardcoded to 0 in DataStreamer, correlation between multiple spans is 
> lost
> ----------------------------------------------------------------------------------
>
>                 Key: HDFS-11622
>                 URL: https://issues.apache.org/jira/browse/HDFS-11622
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: tracing
>            Reporter: Karan Mehta
>
> In the {{run()}} method of {{DataStreamer}} class, the following code is 
> written. {{parents\[0\]}} refer to the {{spanId}} of the parent span.
> {code}
>               one = dataQueue.getFirst(); // regular data packet
>               long parents[] = one.getTraceParents();
>               if (parents.length > 0) {
>                      scope = Trace.startSpan("dataStreamer", new TraceInfo(0, 
> parents[0]));
>                 // TODO: use setParents API once it's available from HTrace 
> 3.2
>                 // scope = Trace.startSpan("dataStreamer", Sampler.ALWAYS);
>                 // scope.getSpan().setParents(parents);
>               }
> {code}
> The {{scope}} starts a new TraceSpan with a traceId hardcoded to 0. Ideally 
> it should be taken when {{currentPacket.addTraceParent(Trace.currentSpan())}} 
> is invoked. This JIRA is to propose an additional long field inside the 
> {{DFSPacket}} class which holds the parent {{traceId}}. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-11622) TraceId hardcoded to 0 in DataStreamer, correlation between multiple spans is lost

Reply via email to