Looks good. Ran the tests 10 times and didn’t fail. Many thanks for looking at it!
Dan > On Aug 29, 2018, at 9:36 PM, Andriy Redko <[email protected]> wrote: > > Hi Dan, > > There was indeed an issue with properly handling joinSpan() in the > AbstractBraveProvider. The span was > finishing indeed, but the joinSpan() was creating the copy of it (same > traceid / spanid), and it was never > closed properly. I added a fix to discard such spans (since they are not > useful but duplicates), but I am > unsure if the joinSpan() is needed in first place (looking at the moment). > The Jenkins infra is down, on > my local I got test runs stable and green, may I ask you please to run > tracing tests to check if the > issue is gone? @Colm, the one(s) you have seen failing before should be fixed > as well. Thanks. > > Best Regards, > Andriy Redko > >>> On Aug 29, 2018, at 2:57 PM, Andrey Redko <[email protected]> wrote: > >>> Hey Dan, > >>> We could try to add a shirt delay (to let spans flush) or trigger flush >>> forceably (if it is feasible). Colm also reported a few tests are failing >>> for him a while back. I will make it my priority to stabilize them. Thanks. > > DK> I don’t think that will fix it in this case. For the async calls, I > don’t think anything is “finishing” the > DK> parent span. The BraveTracerContext.wrap call creates a child scope > which is then closed, but nothing actually > DK> finishes the span that is created in AbstractBraveProvider (line 58). > That span is propagated (line 70), and then > DK> used as the parent, but as far as I can tell, then not ever finished and > thus never flushed out until is garbage > DK> collected or something. Not sure if on the continuation resume, we need > to grab the span and create a scope or something for it. > > DK> Not sure if that helps at all. > > DK> Dan > > > >>> Best Regards, >>> Andriy Redko > >>> On Wed, Aug 29, 2018, 2:33 PM Daniel Kulp <[email protected] >>> <mailto:[email protected]>> wrote: > > >>>> The tracing systests have been very unstable for me, failing more often >>>> then not with failures in >>>> org.apache.cxf.systest.jaxrs.tracing.brave.BraveTracingTest. The test it >>>> eventually fails in in that class seems relatively random. In each case, >>>> the number of spans is greater than what is expected. Is anyone else >>>> seeing that? > >>>> I tried digging into it and it LOOKS like the calls to "get >>>> /bookstore/books/async” are leaving an “inFlight” span in the Tracer. >>>> That span is then delivered at some point in the future which then causes a >>>> test to fail. > >>>> 0: >>>> {"traceId":"3a5f1a7d2de45f49","parentId":"3a5f1a7d2de45f49","id":"b0f4e2ddef4251f5","name":"processing >>>> books","timestamp":1535566440433652,"duration":200595,"localEndpoint":{"serviceName":"unknown","ipv4":"192.168.1.180"}} >>>> 1: >>>> {"traceId":"3a5f1a7d2de45f49","id":"3a5f1a7d2de45f49","kind":"SERVER","name":"get >>>> /bookstore/books/async","timestamp":1535566440423025,"duration":212695,"localEndpoint":{"serviceName":"unknown","ipv4":"192.168.1.180"},"tags":{"http.method":"GET","http.path":"/bookstore/books/async"}} >>>> Tracer{inFlight=[{"traceId":"3a5f1a7d2de45f49","id":"3a5f1a7d2de45f49","localEndpoint":{"serviceName":"unknown","ipv4":"192.168.1.180"},"shared":true}], >>>> reporter=org.apache.cxf.systest.brave.TestSpanReporter@1da2cb77} > > >>>> Is there something missing on the sever side in the async case to close >>>> off the span or something? > > >>>> -- >>>> Daniel Kulp >>>> [email protected] <mailto:[email protected]> <mailto:[email protected] >>>> <mailto:[email protected]>> - http://dankulp.com/blog >>>> <http://dankulp.com/blog> < >>>> http://dankulp.com/blog <http://dankulp.com/blog>> >>>> Talend Community Coder - http://talend.com <http://talend.com/> >>>> <http://coders.talend.com/ <http://coders.talend.com/>> > > -- Daniel Kulp [email protected] <mailto:[email protected]> - http://dankulp.com/blog <http://dankulp.com/blog> Talend Community Coder - http://talend.com <http://coders.talend.com/>
