kevinrr888 commented on PR #5028:
URL: https://github.com/apache/accumulo/pull/5028#issuecomment-2536791070

   So, I was testing this yesterday, and was able to reproduce the same issue 
running `UserFateOpsCommandsIT.testFateFailCommandTimeout()`. I got a jstack 
trace of the thread:
   ```
   "accumulo.pool.manager.fate-Worker-3" #64 daemon prio=5 os_prio=0 
cpu=130772.51ms elapsed=138.20s tid=0x000075971000c9b0 nid=0x610aa runnable  
[0x0000759863ffd000]
      java.lang.Thread.State: RUNNABLE
        at 
java.util.concurrent.LinkedTransferQueue.awaitMatch([email protected]/LinkedTransferQueue.java:652)
        at 
java.util.concurrent.LinkedTransferQueue.xfer([email protected]/LinkedTransferQueue.java:616)
        at 
java.util.concurrent.LinkedTransferQueue.poll([email protected]/LinkedTransferQueue.java:1294)
        at 
org.apache.accumulo.core.fate.Fate$TransactionRunner.reserveFateTx(Fate.java:134)
        at 
org.apache.accumulo.core.fate.Fate$TransactionRunner.run(Fate.java:154)
        at 
org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1136)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:635)
        at 
org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
        at java.lang.Thread.run([email protected]/Thread.java:840)
   ```
   And this is seen again with this repeatedly printed (statement printed from 
`Fate.shutdown()`):
   ```
   Fate USER is waiting for worker threads to terminate
   ```
   I tried to reproduce again today with some debugging statements to better 
figure out why the worker would still be running, but after many, many 
attempts, I did not see the same failure again. Without being able to 
consistently reproduce the bug and without understanding based on the code how 
this could be occurring, I'm out of ideas and things to do to figure this out. 
   
   I'm not sure if this is a bug with my testing logic for the tests where I 
have seen this bug (`testFateFailCommandTimeout`, 
`testFateDeleteCommandTimeout`) (in which case, the bug doesn't matter), or if 
this is a bug with the preexisting `Fate` code (and is unrelated to these tests 
and PR and just happened to show up here).
   
   Anyways, I'm out of things to try. Whenever you get the chance @keith-turner 
maybe you could look at the test logic, `Fate` code, and above stack trace and 
see if something sticks out to you. Or if you think this potential bug with 
`Fate.java` is not really something to worry about, we can just move on.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to