Hi Dan,
There was indeed an issue with properly handling joinSpan() in the
AbstractBraveProvider. The span was
finishing indeed, but the joinSpan() was creating the copy of it (same traceid
/ spanid), and it was never
closed properly. I added a fix to discard such spans (since they are not useful
but duplicates), but I am
unsure if the joinSpan() is needed in first place (looking at the moment). The
Jenkins infra is down, on
my local I got test runs stable and green, may I ask you please to run tracing
tests to check if the
issue is gone? @Colm, the one(s) you have seen failing before should be fixed
as well. Thanks.
Best Regards,
Andriy Redko
>> On Aug 29, 2018, at 2:57 PM, Andrey Redko <[email protected]> wrote:
>> Hey Dan,
>> We could try to add a shirt delay (to let spans flush) or trigger flush
>> forceably (if it is feasible). Colm also reported a few tests are failing
>> for him a while back. I will make it my priority to stabilize them. Thanks.
DK> I don’t think that will fix it in this case. For the async calls, I
don’t think anything is “finishing” the
DK> parent span. The BraveTracerContext.wrap call creates a child scope
which is then closed, but nothing actually
DK> finishes the span that is created in AbstractBraveProvider (line 58). That
span is propagated (line 70), and then
DK> used as the parent, but as far as I can tell, then not ever finished and
thus never flushed out until is garbage
DK> collected or something. Not sure if on the continuation resume, we need to
grab the span and create a scope or something for it.
DK> Not sure if that helps at all.
DK> Dan
>> Best Regards,
>> Andriy Redko
>> On Wed, Aug 29, 2018, 2:33 PM Daniel Kulp <[email protected]
>> <mailto:[email protected]>> wrote:
>>> The tracing systests have been very unstable for me, failing more often
>>> then not with failures in
>>> org.apache.cxf.systest.jaxrs.tracing.brave.BraveTracingTest. The test it
>>> eventually fails in in that class seems relatively random. In each case,
>>> the number of spans is greater than what is expected. Is anyone else
>>> seeing that?
>>> I tried digging into it and it LOOKS like the calls to "get
>>> /bookstore/books/async” are leaving an “inFlight” span in the Tracer.
>>> That span is then delivered at some point in the future which then causes a
>>> test to fail.
>>> 0:
>>> {"traceId":"3a5f1a7d2de45f49","parentId":"3a5f1a7d2de45f49","id":"b0f4e2ddef4251f5","name":"processing
>>> books","timestamp":1535566440433652,"duration":200595,"localEndpoint":{"serviceName":"unknown","ipv4":"192.168.1.180"}}
>>> 1:
>>> {"traceId":"3a5f1a7d2de45f49","id":"3a5f1a7d2de45f49","kind":"SERVER","name":"get
>>> /bookstore/books/async","timestamp":1535566440423025,"duration":212695,"localEndpoint":{"serviceName":"unknown","ipv4":"192.168.1.180"},"tags":{"http.method":"GET","http.path":"/bookstore/books/async"}}
>>> Tracer{inFlight=[{"traceId":"3a5f1a7d2de45f49","id":"3a5f1a7d2de45f49","localEndpoint":{"serviceName":"unknown","ipv4":"192.168.1.180"},"shared":true}],
>>> reporter=org.apache.cxf.systest.brave.TestSpanReporter@1da2cb77}
>>> Is there something missing on the sever side in the async case to close
>>> off the span or something?
>>> --
>>> Daniel Kulp
>>> [email protected] <mailto:[email protected]> <mailto:[email protected]
>>> <mailto:[email protected]>> - http://dankulp.com/blog
>>> <http://dankulp.com/blog> <
>>> http://dankulp.com/blog <http://dankulp.com/blog>>
>>> Talend Community Coder - http://talend.com <http://talend.com/>
>>> <http://coders.talend.com/ <http://coders.talend.com/>>