[ 
https://issues.apache.org/jira/browse/ACCUMULO-4457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christopher Tubbs resolved ACCUMULO-4457.
-----------------------------------------
    Resolution: Cannot Reproduce

Haven't seen this in awhile.

> ConditionalWriterIT.testTrace fails a lot
> -----------------------------------------
>
>                 Key: ACCUMULO-4457
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4457
>             Project: Accumulo
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 1.6.6
>            Reporter: Christopher Tubbs
>            Priority: Major
>
> So, I've been trying to test 1.6.6 release candidate, and I keep getting 
> frequent failures of the ConditionalWriterIT.testTrace.
> I've seen two kinds of failures, both of which result in timeouts (because 
> the IT continues to retry until the timeout).
> The first is a very simple case: it can't find any traces. This is probably 
> due to a startup race condition where the client runs before the tracer is 
> up. The spammy output looks like:
> {code}
> 2016-09-14 17:04:26,799 [test.ConditionalWriterIT] INFO : Ignoring trace 
> output as traceCount not greater than zero: 0
> 2016-09-14 17:04:27,802 [test.ConditionalWriterIT] INFO : Trace output:Did 
> not find any traces!
> 2016-09-14 17:04:27,802 [test.ConditionalWriterIT] INFO : Ignoring trace 
> output as traceCount not greater than zero: 0
> 2016-09-14 17:04:28,805 [test.ConditionalWriterIT] INFO : Trace output:Did 
> not find any traces!
> 2016-09-14 17:04:28,805 [test.ConditionalWriterIT] INFO : Ignoring trace 
> output as traceCount not greater than zero: 0
> 2016-09-14 17:04:29,808 [test.ConditionalWriterIT] INFO : Trace output:Did 
> not find any traces!
> {code}
> The TraceServer's log file contain no activity after startup (no activity 
> indicated for successful run, either... the tracer probably just doesn't log 
> much, or I don't have the log threshold low enough).
> The second, and more annoying failure is that it appears sometimes, some 
> spans are unrooted.
> The output of a failed run looks like:
> {code}
> 2016-09-13 22:15:06,464 [test.ConditionalWriterIT] INFO : Ignoring trace 
> output as traceCount not greater than zero: -1
> 2016-09-13 22:15:07,475 [test.ConditionalWriterIT] INFO : Trace output:Trace 
> started at 2016/09/13 22:15:01.046
> Time  Start  Service@Location       Name
>   322+0      testTrace@localhost traceTest
>     0+35       testTrace@localhost client:startScan
>    27+40       [email protected] startScan
>     0+67       testTrace@localhost client:closeScan
>     0+68       [email protected] closeScan
>     0+68       testTrace@localhost client:startScan
>     3+69       [email protected] startScan
>     1+70         [email protected] metadata tablets read ahead 7
>     0+72       testTrace@localhost client:closeScan
>     0+73       [email protected] closeScan
> Warning: the following spans are not rooted!
> 98074c50fb57bce3 44a1f6292a6ec858 client:startScan
> 803c22e524cfaab4 c0944de077dcfbaa metadata tablets read ahead 2
> c0944de077dcfbaa 44a1f6292a6ec858 startScan
> e87ec35eb1a659ee 44a1f6292a6ec858 client:closeScan
> 75e4a9576afc000b 98bfb7e65f27e99 client:startConditionalUpdate
> bd6250a4d3050aba 44a1f6292a6ec858 closeScan
> 98bfb7e65f27e99 44a1f6292a6ec858 ConditionalWriterImpl 2
> d83a024f5e4dbae5 98bfb7e65f27e99 startConditionalUpdate
> 45472c5b3fc34b0b 98bfb7e65f27e99 client:conditionalUpdate
> 42d81ffa77149a94 98bfb7e65f27e99 conditionalUpdate
> 2d6c20f144b406cd 42d81ffa77149a94 Check conditions
> 715be33f3216fe99 42d81ffa77149a94 apply conditional mutations
> ff11b4df0e8ddd94 715be33f3216fe99 prep
> 9acf9f34eeb4c7c1 715be33f3216fe99 wal
> 191305dbf1fb40ff 9acf9f34eeb4c7c1 update
> 3dd49cdc4b5a2d7d 9acf9f34eeb4c7c1 client:update
> e31c17c89e1b3cdc 191305dbf1fb40ff prep
> 84dbf7ebd59be75 191305dbf1fb40ff commit
> 198bc516183d8e7f 191305dbf1fb40ff wal
> 61b91d25c997c748 715be33f3216fe99 commit
> {code}
> For comparison, the output of a successful run looks like:
> {code}
> 2016-09-14 17:07:57,125 [test.ConditionalWriterIT] INFO : Ignoring trace 
> output as traceCount not greater than zero: -1
> 2016-09-14 17:07:58,134 [test.ConditionalWriterIT] INFO : Trace output:Trace 
> started at 2016/09/14 17:07:54.663
> Time  Start  Service@Location       Name
>   428+0      testTrace@localhost traceTest
>     3+360      testTrace@localhost client:startScan
>     4+365      [email protected] startScan
>     0+377      testTrace@localhost client:closeScan
>     0+378      [email protected] closeScan
>     0+380      testTrace@localhost client:startScan
>     5+381      [email protected] startScan
>     0+385        [email protected] metadata tablets read ahead 4
>     0+388      [email protected] closeScan
>     0+388      testTrace@localhost client:closeScan
>    39+389      testTrace@localhost ConditionalWriterImpl 1
>     1+397        testTrace@localhost client:startConditionalUpdate
>     2+402        [email protected] startConditionalUpdate
>     2+408        testTrace@localhost client:conditionalUpdate
>     8+417        [email protected] conditionalUpdate
>     2+419          [email protected] Check conditions
>     4+421          [email protected] apply conditional mutations
>     1+421            [email protected] prep
>     3+422            [email protected] wal 
>     0+423              [email protected] client:update
>     2+423              [email protected] update
>     0+424                [email protected] wal 
>     0+424                [email protected] prep
>     0+425                [email protected] commit
>     0+425            [email protected] commit
> The following is all the nodes!
> c63771e049ce64df 0 traceTest
> 346315d5c08dacd c63771e049ce64df client:startScan
> 83a6b294967ec2c2 c63771e049ce64df startScan
> c31f792dd3cf0479 c63771e049ce64df client:closeScan
> 6847784061285c9f c63771e049ce64df closeScan
> 6675da322b33924f c63771e049ce64df client:startScan
> 874c41927959277a c63771e049ce64df startScan
> 7f26ece0e74cebd3 874c41927959277a metadata tablets read ahead 4
> 7e4d4904313a0096 c63771e049ce64df closeScan
> 9324832a57fbe3c1 c63771e049ce64df client:closeScan
> e6cccd862b014fc2 c63771e049ce64df ConditionalWriterImpl 1
> e8c2e500c3b87beb e6cccd862b014fc2 client:startConditionalUpdate
> 1bbe4c4bc22565d9 e6cccd862b014fc2 startConditionalUpdate
> f13d4a0dfbb322ed e6cccd862b014fc2 client:conditionalUpdate
> 2ec39d24415847b1 e6cccd862b014fc2 conditionalUpdate
> 2b076238cd26cb8f 2ec39d24415847b1 Check conditions
> c0addb5c7a6b3fca 2ec39d24415847b1 apply conditional mutations
> 2dd63e5ea321119d c0addb5c7a6b3fca prep
> d6958323b0cb7f5e c0addb5c7a6b3fca wal 
> 616296d843722d0a d6958323b0cb7f5e client:update
> db05c61214e16ce5 d6958323b0cb7f5e update
> 9d3417d7b56a21e0 db05c61214e16ce5 prep
> 2d66569d22a19871 db05c61214e16ce5 wal 
> 21cd8265be1e6c94 c0addb5c7a6b3fca commit
> b381168631ad019c db05c61214e16ce5 commit
> {code}
> I added code to dump the span IDs and parent IDs for the rooted spans, to 
> more easily compare.
> The biggest thing that stands out for me is that the one with the description 
> {{ConditionalWriterImpl 2}} is not rooted at "testTrace" like the 
> {{ConditionalWriterImpl 1}} is in the successful case.
> As far as I can tell, the failure case is always case {{2}}, and the 
> successful case is always case {{1}}.
> It looks like the problem is caused by the {{NamingThreadFactory}} creating 
> wrapping {{Runnable}} s with a new {{TraceRunnable}} based on the current 
> thread's currentSpan. It's not known which thread is creating new threads 
> within the thread pool's executor. Instead of having {{NamingThreadFactory}} 
> wrap when it creates a new executor thread, we need to ensure we wrap the 
> spans before we place them into the thread pool (when we call {{execute()}}), 
> based on the originating  thread's currentSpan.
> The {{NamingThreadFactory}} is used in multiple places, so it may not be an 
> easy fix to address this in all places.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to