Hi Dan,

There was indeed an issue with properly handling joinSpan() in the 
AbstractBraveProvider. The span was
finishing indeed, but the joinSpan() was creating the copy of it (same traceid 
/ spanid), and it was never
closed properly. I added a fix to discard such spans (since they are not useful 
but duplicates), but I am
unsure if the joinSpan() is needed in first place (looking at the moment). The 
Jenkins infra is down, on
my local I got test runs stable and green, may I ask you please to run tracing 
tests to check if the
issue is gone? @Colm, the one(s) you have seen failing before should be fixed 
as well. Thanks.

Best Regards,
    Andriy Redko

>> On Aug 29, 2018, at 2:57 PM, Andrey Redko <[email protected]> wrote:

>> Hey Dan,

>> We could try to add a shirt delay (to let spans flush) or trigger flush
>> forceably (if it is feasible). Colm also reported a few tests are failing
>> for him a while back. I will make it my priority to stabilize them. Thanks.

DK> I don’t think that will fix it in this case.    For the async calls, I 
don’t think anything is “finishing” the
DK> parent span.    The BraveTracerContext.wrap call creates a child scope 
which is then closed, but nothing actually
DK> finishes the span that is created in AbstractBraveProvider (line 58).  That 
span is propagated (line 70), and then
DK> used as the parent, but as far as I can tell, then not ever finished and 
thus never flushed out until is garbage
DK> collected or something.  Not sure if on the continuation resume, we need to 
grab the span and create a scope or something for it.

DK> Not sure if that helps at all.

DK> Dan



>> Best Regards,
>>    Andriy Redko

>> On Wed, Aug 29, 2018, 2:33 PM Daniel Kulp <[email protected] 
>> <mailto:[email protected]>> wrote:


>>> The tracing systests  have been very unstable for me, failing more often
>>> then not with failures in
>>> org.apache.cxf.systest.jaxrs.tracing.brave.BraveTracingTest.    The test it
>>> eventually fails in in that class seems relatively random.   In each case,
>>> the number of spans is greater than what is expected.  Is anyone else
>>> seeing that?

>>> I tried digging into it and it LOOKS like the calls to "get
>>> /bookstore/books/async” are leaving an “inFlight” span in the Tracer.
>>> That span is then delivered at some point in the future which then causes a
>>> test to fail.

>>> 0:
>>> {"traceId":"3a5f1a7d2de45f49","parentId":"3a5f1a7d2de45f49","id":"b0f4e2ddef4251f5","name":"processing
>>> books","timestamp":1535566440433652,"duration":200595,"localEndpoint":{"serviceName":"unknown","ipv4":"192.168.1.180"}}
>>> 1:
>>> {"traceId":"3a5f1a7d2de45f49","id":"3a5f1a7d2de45f49","kind":"SERVER","name":"get
>>> /bookstore/books/async","timestamp":1535566440423025,"duration":212695,"localEndpoint":{"serviceName":"unknown","ipv4":"192.168.1.180"},"tags":{"http.method":"GET","http.path":"/bookstore/books/async"}}
>>> Tracer{inFlight=[{"traceId":"3a5f1a7d2de45f49","id":"3a5f1a7d2de45f49","localEndpoint":{"serviceName":"unknown","ipv4":"192.168.1.180"},"shared":true}],
>>> reporter=org.apache.cxf.systest.brave.TestSpanReporter@1da2cb77}


>>> Is there something missing on the sever side in the async case to close
>>> off the span or something?


>>> --
>>> Daniel Kulp
>>> [email protected] <mailto:[email protected]> <mailto:[email protected] 
>>> <mailto:[email protected]>> - http://dankulp.com/blog 
>>> <http://dankulp.com/blog> <
>>> http://dankulp.com/blog <http://dankulp.com/blog>>
>>> Talend Community Coder - http://talend.com <http://talend.com/> 
>>> <http://coders.talend.com/ <http://coders.talend.com/>>


Reply via email to