[jira] [Commented] (HTRACE-92) Thread local storing the currentSpan is never cleared

Colin Patrick McCabe (JIRA) Wed, 28 Jan 2015 12:48:07 -0800

    [ 
https://issues.apache.org/jira/browse/HTRACE-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14295831#comment-14295831
 ]


Colin Patrick McCabe commented on HTRACE-92:
--------------------------------------------

I always assumed that we would implement a special bit in the HBase / HDFS 
request protobuf that hard-disables tracing, even if the sampler says it should 
trace.  Then we would set that bit for requests generated by trace sinks 
themselves.  I don't think we actually got around to doing this in HDFS... I've 
been working more on the htraced trace sink more lately, which doesn't have 
this "recursion" issue since it is a separate system.

Did you guys get a chance to check the code for {{TraceScope}} leaks?  I am 
really curious what the root cause of this issue is.

> Thread local storing the currentSpan is never cleared
> -----------------------------------------------------
>
>                 Key: HTRACE-92
>                 URL: https://issues.apache.org/jira/browse/HTRACE-92
>             Project: HTrace
>          Issue Type: Bug
>    Affects Versions: 3.1.0
>            Reporter: Samarth Jain
>         Attachments: Screen Shot 2015-01-27 at 11.20.36 PM.png
>
>
> In Apache Phoenix, we use HTrace to provide request level trace information. 
> The trace information (traceid, parentid, spanid, description) among other 
> columns is stored in a Phoenix table. 
> Spans that are MilliSpan get persisted to the phoenix table this way:
> MilliSpans-> PhoenixTraceMetricsSource-> PhoenixMetricsSink->Phoenix Table
> While inserting the traces to the phoenix table, we make sure that the upsert 
> happening in sink is through a connection that doesn't have tracing on. So 
> this way any spans that are created by executing these upsert statements on 
> the client side are NullSpans. 
> On server side too, when these batched up upsert statements are executed as 
> batchMutate operations, they are not expected to have tracing on i.e. the 
> current spans are null. However, we noticed that these current spans are not 
> null which ends up resulting in an infinite loop. An example of such infinite 
> loop is the following:
> batchmutate -> fshlog.append -> check tracing on i.e. current span is not 
> null -> yes -> create milli span -> do operation -> stop span -> publish to 
> metrics source -> phoenix metrics sink -> upsert statement -> batchmutate -> 
> fshlog.append......
> My local cluster infact dies because of this infinite loop!!
> On examining the thread local of the threads in the RPC thread pool, I saw 
> that there were threads that had current spans that were closed at least an 
> hour before. See the screenshot attached. 
> The screenshot was taken at 11:20 PM and the thread had a current span whose 
> stop time was 10:17 PM. 
> These brought up a couple of design issues/limitations in the HTrace API:
> 1) There is no good way to set (reset) the value of the thread local current 
> span to null. This is a huge issue especially if we we are reusing threads 
> from a thread pool. 
> 2) Should we allow creating spans if the parent span is not running anymore 
> i.e. Span.isRunning() is false. In the example I have shown in the screen 
> shot, the current span stored in the thread local is already closed. 
> Essentially making {code}
> boolean isTracing() {
> return currentSpan.get() != null && currentSpan.get().isRunning()
> {code} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HTRACE-92) Thread local storing the currentSpan is never cleared

Reply via email to