[ 
https://issues.apache.org/jira/browse/HTRACE-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14295597#comment-14295597
 ] 

Andrew Purtell commented on HTRACE-92:
--------------------------------------

I filed HBASE-12938

> Thread local storing the currentSpan is never cleared
> -----------------------------------------------------
>
>                 Key: HTRACE-92
>                 URL: https://issues.apache.org/jira/browse/HTRACE-92
>             Project: HTrace
>          Issue Type: Bug
>    Affects Versions: 3.1.0
>            Reporter: Samarth Jain
>         Attachments: Screen Shot 2015-01-27 at 11.20.36 PM.png
>
>
> In Apache Phoenix, we use HTrace to provide request level trace information. 
> The trace information (traceid, parentid, spanid, description) among other 
> columns is stored in a Phoenix table. 
> Spans that are MilliSpan get persisted to the phoenix table this way:
> MilliSpans-> PhoenixTraceMetricsSource-> PhoenixMetricsSink->Phoenix Table
> While inserting the traces to the phoenix table, we make sure that the upsert 
> happening in sink is through a connection that doesn't have tracing on. So 
> this way any spans that are created by executing these upsert statements on 
> the client side are NullSpans. 
> On server side too, when these batched up upsert statements are executed as 
> batchMutate operations, they are not expected to have tracing on i.e. the 
> current spans are null. However, we noticed that these current spans are not 
> null which ends up resulting in an infinite loop. An example of such infinite 
> loop is the following:
> batchmutate -> fshlog.append -> check tracing on i.e. current span is not 
> null -> yes -> create milli span -> do operation -> stop span -> publish to 
> metrics source -> phoenix metrics sink -> upsert statement -> batchmutate -> 
> fshlog.append......
> My local cluster infact dies because of this infinite loop!!
> On examining the thread local of the threads in the RPC thread pool, I saw 
> that there were threads that had current spans that were closed at least an 
> hour before. See the screenshot attached. 
> The screenshot was taken at 11:20 PM and the thread had a current span whose 
> stop time was 10:17 PM. 
> These brought up a couple of design issues/limitations in the HTrace API:
> 1) There is no good way to set (reset) the value of the thread local current 
> span to null. This is a huge issue especially if we we are reusing threads 
> from a thread pool. 
> 2) Should we allow creating spans if the parent span is not running anymore 
> i.e. Span.isRunning() is false. In the example I have shown in the screen 
> shot, the current span stored in the thread local is already closed. 
> Essentially making {code}
> boolean isTracing() {
> return currentSpan.get() != null && currentSpan.get().isRunning()
> {code} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to