[
https://issues.apache.org/jira/browse/HTRACE-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14295597#comment-14295597
]
Andrew Purtell commented on HTRACE-92:
--------------------------------------
I filed HBASE-12938
> Thread local storing the currentSpan is never cleared
> -----------------------------------------------------
>
> Key: HTRACE-92
> URL: https://issues.apache.org/jira/browse/HTRACE-92
> Project: HTrace
> Issue Type: Bug
> Affects Versions: 3.1.0
> Reporter: Samarth Jain
> Attachments: Screen Shot 2015-01-27 at 11.20.36 PM.png
>
>
> In Apache Phoenix, we use HTrace to provide request level trace information.
> The trace information (traceid, parentid, spanid, description) among other
> columns is stored in a Phoenix table.
> Spans that are MilliSpan get persisted to the phoenix table this way:
> MilliSpans-> PhoenixTraceMetricsSource-> PhoenixMetricsSink->Phoenix Table
> While inserting the traces to the phoenix table, we make sure that the upsert
> happening in sink is through a connection that doesn't have tracing on. So
> this way any spans that are created by executing these upsert statements on
> the client side are NullSpans.
> On server side too, when these batched up upsert statements are executed as
> batchMutate operations, they are not expected to have tracing on i.e. the
> current spans are null. However, we noticed that these current spans are not
> null which ends up resulting in an infinite loop. An example of such infinite
> loop is the following:
> batchmutate -> fshlog.append -> check tracing on i.e. current span is not
> null -> yes -> create milli span -> do operation -> stop span -> publish to
> metrics source -> phoenix metrics sink -> upsert statement -> batchmutate ->
> fshlog.append......
> My local cluster infact dies because of this infinite loop!!
> On examining the thread local of the threads in the RPC thread pool, I saw
> that there were threads that had current spans that were closed at least an
> hour before. See the screenshot attached.
> The screenshot was taken at 11:20 PM and the thread had a current span whose
> stop time was 10:17 PM.
> These brought up a couple of design issues/limitations in the HTrace API:
> 1) There is no good way to set (reset) the value of the thread local current
> span to null. This is a huge issue especially if we we are reusing threads
> from a thread pool.
> 2) Should we allow creating spans if the parent span is not running anymore
> i.e. Span.isRunning() is false. In the example I have shown in the screen
> shot, the current span stored in the thread local is already closed.
> Essentially making {code}
> boolean isTracing() {
> return currentSpan.get() != null && currentSpan.get().isRunning()
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)