dlmarion commented on PR #5811: URL: https://github.com/apache/accumulo/pull/5811#issuecomment-3210663241
I'm seeing an inconsistency in how we handle Exceptions and Errors in the server threads (AbstractServer.run). In all other threads the AccumuloUncaughtExceptionHandler will: 1. Log an error that the thread is dead on an uncaught Exception 2. Print the stack trace and a message to stderr then halt the VM on an uncaught Error For critical tasks put into a ThreadPool, we have a method that watches them and throws an ExecutionError if the task has failed. This Error is uncaught and would bubble up to the AccumuloUncaugthExceptionHandler, terminating the VM. We made some changes recently to AbstractServer in #5796 and #5808 such that a new method `AbstractServer.startServer` exists because exceptions and errors being throws from `AbstractServer.runServer` were not being logged. This method looks like: https://github.com/apache/accumulo/blob/3a0d8e5c72915fb4c7ce26d6156d15b0a444fb71/server/base/src/main/java/org/apache/accumulo/server/AbstractServer.java#L78-L97 WIth this new method if `AbstractServer.runServer` completes successfully, then the server process does a normal shutdown finishing with a call to `AbstractServer.close`. Due to the `finally` block in `AbstractServer.startServer` the server's close method will be called again. If `AbstractServer.runServer` throws an Exception or Error, then the server's close method will be called by the `finally` block in `AbstractServer.startServer`, but the other normal shutdown stuff won't occur. Looking at one of the logs from ExitCodesIT we see the following: ``` 2025-08-21T12:59:37,729 12 [exit_codes.TabletServer_ERROR] INFO : TabletServer_ERROR process shut down. 2025-08-21T12:59:37,729 12 [tserver.TabletServer] ERROR: TabletServer_ERROR died, exception thrown from runServer. java.lang.StackOverflowError: throwing unknown error at org.apache.accumulo.test.functional.exit_codes.TabletServer_ERROR.updateIdleStatus(Unknown Source) ~[?:?] at org.apache.accumulo.tserver.TabletServer.run(TabletServer.java:607) ~[classes/:?] at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52) ~[classes/:?] at java.base/java.lang.Thread.run(Thread.java:829) [?:?] 2025-08-21T12:59:37,733 12 [metrics.MetricsInfoImpl] INFO : Closing metrics registry 2025-08-21T12:59:37,750 12 [server.ServerContext] DEBUG: Shutting down shared executor pool 2025-08-21T12:59:37,750 12 [clientImpl.ClientContext] DEBUG: Closing ZooCache 2025-08-21T12:59:37,750 12 [clientImpl.ClientContext] DEBUG: Closing ZooSession 2025-08-21T12:59:37,752 21 [lock.ServiceLock] DEBUG: [zlock#5c6acefc-a1c0-4fdc-8c81-a7acd162c56e#] zlock#5c6acefc-a1c0-4fdc-8c81-a7acd162c56e#0000000000 was deleted; WatchedEvent state:SyncConnected type:NodeDeleted path:/tservers/TEST/ip-10-113-15-120.evoforge.org:33045/zlock#5c6acefc-a1c0-4fdc-8c81-a7acd162c56e#0000000000 zxid: 827 2025-08-21T12:59:37,754 21 [util.Halt] ERROR: FATAL TABLET_SERVER lost lock (reason = LOCK_DELETED), exiting. ``` In any other thread, the VM would be halted without closing. I'm thinking that the `finally` block from `AbstractServer.startServer` should be removed. Thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org