[ 
https://issues.apache.org/jira/browse/HTRACE-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299626#comment-14299626
 ] 

Colin Patrick McCabe commented on HTRACE-92:
--------------------------------------------

bq. Thanks for the work on this, Colin Patrick McCabe. FYI, things are looking 
much better now that Samarth Jain was able to track down the missing close. 
HTrace appears to be working well so far - we'll get you more information 
regarding load on cluster, feature requests, etc.

That's great to hear!  Would love to hear more about the load and also feature 
requests.  Have you looked at the new {{HTracedRESTReceiver}} stuff that 
[~stack] has been working on?  I think it might help with the "accidentally 
tracing the  tracing system" problem mentioned earlier on this JIRA (if it is 
indeed a problem... I guess the discussion about this wasn't conclusive)

bq. Wanted to bring up the "what makes sense to do in 0.98 HBase question" 
again, stack & Andrew Purtell. Given that there are already ~25 calls to do 
HTrace stuff in HBase, wouldn't it be prudent to add one more HTrace cleanup 
call, either when a thread comes out or goes back into the thread pool? This 
would guard against a coprocessor client missing a span close and subsequently 
bringing down the region server. The call could be yanked back out once it's no 
longer needed (once HBase moves to the Apache HTrace release with improvements 
in this area).

That's an interesting point, and one we haven't really thought about before.  
If we have something like HBase coprocessors, it would be nice to avoid having 
bad coprocessors cause too much damage unintentionally.  I think you should be 
able to (ab)use the existing {{Tracer#continueSpan}} API to force the span to 
be what it "should" be when the coprocessor is done executing.

Does it make sense to resolve this JIRA now?

> Thread local storing the currentSpan is never cleared
> -----------------------------------------------------
>
>                 Key: HTRACE-92
>                 URL: https://issues.apache.org/jira/browse/HTRACE-92
>             Project: HTrace
>          Issue Type: Bug
>    Affects Versions: 3.1.0
>            Reporter: Samarth Jain
>         Attachments: Screen Shot 2015-01-27 at 11.20.36 PM.png
>
>
> In Apache Phoenix, we use HTrace to provide request level trace information. 
> The trace information (traceid, parentid, spanid, description) among other 
> columns is stored in a Phoenix table. 
> Spans that are MilliSpan get persisted to the phoenix table this way:
> MilliSpans-> PhoenixTraceMetricsSource-> PhoenixMetricsSink->Phoenix Table
> While inserting the traces to the phoenix table, we make sure that the upsert 
> happening in sink is through a connection that doesn't have tracing on. So 
> this way any spans that are created by executing these upsert statements on 
> the client side are NullSpans. 
> On server side too, when these batched up upsert statements are executed as 
> batchMutate operations, they are not expected to have tracing on i.e. the 
> current spans are null. However, we noticed that these current spans are not 
> null which ends up resulting in an infinite loop. An example of such infinite 
> loop is the following:
> batchmutate -> fshlog.append -> check tracing on i.e. current span is not 
> null -> yes -> create milli span -> do operation -> stop span -> publish to 
> metrics source -> phoenix metrics sink -> upsert statement -> batchmutate -> 
> fshlog.append......
> My local cluster infact dies because of this infinite loop!!
> On examining the thread local of the threads in the RPC thread pool, I saw 
> that there were threads that had current spans that were closed at least an 
> hour before. See the screenshot attached. 
> The screenshot was taken at 11:20 PM and the thread had a current span whose 
> stop time was 10:17 PM. 
> These brought up a couple of design issues/limitations in the HTrace API:
> 1) There is no good way to set (reset) the value of the thread local current 
> span to null. This is a huge issue especially if we we are reusing threads 
> from a thread pool. 
> 2) Should we allow creating spans if the parent span is not running anymore 
> i.e. Span.isRunning() is false. In the example I have shown in the screen 
> shot, the current span stored in the thread local is already closed. 
> Essentially making {code}
> boolean isTracing() {
> return currentSpan.get() != null && currentSpan.get().isRunning()
> {code} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to