Re: Unstable tracing systests...

Daniel Kulp Wed, 29 Aug 2018 19:55:08 -0700

Looks good.  Ran the tests 10 times and didn’t fail.   Many thanks for looking 
at it!


Dan


> On Aug 29, 2018, at 9:36 PM, Andriy Redko <[email protected]> wrote:
> 
> Hi Dan,
> 
> There was indeed an issue with properly handling joinSpan() in the 
> AbstractBraveProvider. The span was
> finishing indeed, but the joinSpan() was creating the copy of it (same 
> traceid / spanid), and it was never
> closed properly. I added a fix to discard such spans (since they are not 
> useful but duplicates), but I am
> unsure if the joinSpan() is needed in first place (looking at the moment). 
> The Jenkins infra is down, on
> my local I got test runs stable and green, may I ask you please to run 
> tracing tests to check if the
> issue is gone? @Colm, the one(s) you have seen failing before should be fixed 
> as well. Thanks.
> 
> Best Regards,
>    Andriy Redko
> 
>>> On Aug 29, 2018, at 2:57 PM, Andrey Redko <[email protected]> wrote:
> 
>>> Hey Dan,
> 
>>> We could try to add a shirt delay (to let spans flush) or trigger flush
>>> forceably (if it is feasible). Colm also reported a few tests are failing
>>> for him a while back. I will make it my priority to stabilize them. Thanks.
> 
> DK> I don’t think that will fix it in this case.    For the async calls, I 
> don’t think anything is “finishing” the
> DK> parent span.    The BraveTracerContext.wrap call creates a child scope 
> which is then closed, but nothing actually
> DK> finishes the span that is created in AbstractBraveProvider (line 58).  
> That span is propagated (line 70), and then
> DK> used as the parent, but as far as I can tell, then not ever finished and 
> thus never flushed out until is garbage
> DK> collected or something.  Not sure if on the continuation resume, we need 
> to grab the span and create a scope or something for it.
> 
> DK> Not sure if that helps at all.
> 
> DK> Dan
> 
> 
> 
>>> Best Regards,
>>>   Andriy Redko
> 
>>> On Wed, Aug 29, 2018, 2:33 PM Daniel Kulp <[email protected] 
>>> <mailto:[email protected]>> wrote:
> 
> 
>>>> The tracing systests  have been very unstable for me, failing more often
>>>> then not with failures in
>>>> org.apache.cxf.systest.jaxrs.tracing.brave.BraveTracingTest.    The test it
>>>> eventually fails in in that class seems relatively random.   In each case,
>>>> the number of spans is greater than what is expected.  Is anyone else
>>>> seeing that?
> 
>>>> I tried digging into it and it LOOKS like the calls to "get
>>>> /bookstore/books/async” are leaving an “inFlight” span in the Tracer.
>>>> That span is then delivered at some point in the future which then causes a
>>>> test to fail.
> 
>>>> 0:
>>>> {"traceId":"3a5f1a7d2de45f49","parentId":"3a5f1a7d2de45f49","id":"b0f4e2ddef4251f5","name":"processing
>>>> books","timestamp":1535566440433652,"duration":200595,"localEndpoint":{"serviceName":"unknown","ipv4":"192.168.1.180"}}
>>>> 1:
>>>> {"traceId":"3a5f1a7d2de45f49","id":"3a5f1a7d2de45f49","kind":"SERVER","name":"get
>>>> /bookstore/books/async","timestamp":1535566440423025,"duration":212695,"localEndpoint":{"serviceName":"unknown","ipv4":"192.168.1.180"},"tags":{"http.method":"GET","http.path":"/bookstore/books/async"}}
>>>> Tracer{inFlight=[{"traceId":"3a5f1a7d2de45f49","id":"3a5f1a7d2de45f49","localEndpoint":{"serviceName":"unknown","ipv4":"192.168.1.180"},"shared":true}],
>>>> reporter=org.apache.cxf.systest.brave.TestSpanReporter@1da2cb77}
> 
> 
>>>> Is there something missing on the sever side in the async case to close
>>>> off the span or something?
> 
> 
>>>> --
>>>> Daniel Kulp
>>>> [email protected] <mailto:[email protected]> <mailto:[email protected] 
>>>> <mailto:[email protected]>> - http://dankulp.com/blog 
>>>> <http://dankulp.com/blog> <
>>>> http://dankulp.com/blog <http://dankulp.com/blog>>
>>>> Talend Community Coder - http://talend.com <http://talend.com/> 
>>>> <http://coders.talend.com/ <http://coders.talend.com/>>
> 
> 

-- 
Daniel Kulp
[email protected] <mailto:[email protected]> - http://dankulp.com/blog 
<http://dankulp.com/blog>
Talend Community Coder - http://talend.com <http://coders.talend.com/>

Re: Unstable tracing systests...

Reply via email to