dlmarion commented on PR #5811:
URL: https://github.com/apache/accumulo/pull/5811#issuecomment-3210663241

   I'm seeing an inconsistency in how we handle Exceptions and Errors in the 
server threads (AbstractServer.run).
   
   In all other threads the AccumuloUncaughtExceptionHandler will:
   
   1. Log an error that the thread is dead on an uncaught Exception
   2. Print the stack trace and a message to stderr then halt the VM on an 
uncaught Error
   
   For critical tasks put into a ThreadPool, we have a method that watches them 
and throws an ExecutionError if the task has failed. This Error is uncaught and 
would bubble up to the AccumuloUncaugthExceptionHandler, terminating the VM.
   
   We made some changes recently to AbstractServer in #5796 and #5808 such that 
a new method `AbstractServer.startServer` exists because exceptions and errors 
being throws from `AbstractServer.runServer` were not being logged. This method 
looks like:
   
   
https://github.com/apache/accumulo/blob/3a0d8e5c72915fb4c7ce26d6156d15b0a444fb71/server/base/src/main/java/org/apache/accumulo/server/AbstractServer.java#L78-L97
   
   WIth this new method if `AbstractServer.runServer` completes successfully, 
then the server process does a normal shutdown finishing with a call to 
`AbstractServer.close`. Due to the `finally` block in 
`AbstractServer.startServer` the server's close method will be called again. If 
`AbstractServer.runServer` throws an Exception or Error, then the server's 
close method will be called by the `finally` block in 
`AbstractServer.startServer`, but the other normal shutdown stuff won't occur.
   
   Looking at one of the logs from ExitCodesIT we see the following:
   ```
   2025-08-21T12:59:37,729 12 [exit_codes.TabletServer_ERROR] INFO : 
TabletServer_ERROR process shut down.
   2025-08-21T12:59:37,729 12 [tserver.TabletServer] ERROR: TabletServer_ERROR 
died, exception thrown from runServer.
   java.lang.StackOverflowError: throwing unknown error
        at 
org.apache.accumulo.test.functional.exit_codes.TabletServer_ERROR.updateIdleStatus(Unknown
 Source) ~[?:?]
        at org.apache.accumulo.tserver.TabletServer.run(TabletServer.java:607) 
~[classes/:?]
        at 
org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
 ~[classes/:?]
        at java.base/java.lang.Thread.run(Thread.java:829) [?:?]
   2025-08-21T12:59:37,733 12 [metrics.MetricsInfoImpl] INFO : Closing metrics 
registry
   2025-08-21T12:59:37,750 12 [server.ServerContext] DEBUG: Shutting down 
shared executor pool
   2025-08-21T12:59:37,750 12 [clientImpl.ClientContext] DEBUG: Closing ZooCache
   2025-08-21T12:59:37,750 12 [clientImpl.ClientContext] DEBUG: Closing 
ZooSession
   2025-08-21T12:59:37,752 21 [lock.ServiceLock] DEBUG: 
[zlock#5c6acefc-a1c0-4fdc-8c81-a7acd162c56e#] 
zlock#5c6acefc-a1c0-4fdc-8c81-a7acd162c56e#0000000000 was deleted; WatchedEvent 
state:SyncConnected type:NodeDeleted 
path:/tservers/TEST/ip-10-113-15-120.evoforge.org:33045/zlock#5c6acefc-a1c0-4fdc-8c81-a7acd162c56e#0000000000
 zxid: 827
   2025-08-21T12:59:37,754 21 [util.Halt] ERROR: FATAL TABLET_SERVER lost lock 
(reason = LOCK_DELETED), exiting.
   ```
   
   In any other thread, the VM would be halted without closing. I'm thinking 
that the `finally` block from `AbstractServer.startServer` should be removed. 
Thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to