keith-turner commented on code in PR #5576: URL: https://github.com/apache/accumulo/pull/5576#discussion_r2109473622
########## server/manager/src/main/java/org/apache/accumulo/manager/Manager.java: ########## @@ -1265,14 +1265,21 @@ public void run() { context.getTableManager().addObserver(this); - Thread statusThread = Threads.createThread("Status Thread", new StatusThread()); + // TODO KEVIN RATHBUN updating the Manager state seems like a critical function. However, the + // thread already handles, waits, and continues in the case of any Exception, so critical or + // non critical doesn't make a difference here. + Thread statusThread = Threads.createCriticalThread("Status Thread", new StatusThread()); Review Comment: > so critical or non critical doesn't make a difference here still seems nice to make this change though, makes the code more resilient to changes in the function. Like if a bit of code was added outside the try block in the method and it threw a runtime exception. ########## server/manager/src/main/java/org/apache/accumulo/manager/Manager.java: ########## @@ -1616,14 +1625,16 @@ private TServer setupReplication() Property.MANAGER_REPLICATION_COORDINATOR_THREADCHECK, maxMessageSizeProperty); log.info("Started replication coordinator service at " + replAddress.address); + // TODO KEVIN RATHBUN this thread creation exists within a task which is labeled non-critical + // so assuming these are as well. // Start the daemon to scan the replication table and make units of work - replicationWorkThread = Threads.createThread("Replication Driver", + replicationWorkThread = Threads.createNonCriticalThread("Replication Driver", new org.apache.accumulo.manager.replication.ReplicationDriver(this)); replicationWorkThread.start(); // Start the daemon to assign work to tservers to replicate to our peers var wd = new org.apache.accumulo.manager.replication.WorkDriver(this); - replicationAssignerThread = Threads.createThread(wd.getName(), wd); + replicationAssignerThread = Threads.createNonCriticalThread(wd.getName(), wd); Review Comment: > within a task which is labeled non-critical Where is it labeled non-critical? ########## server/tserver/src/main/java/org/apache/accumulo/tserver/TabletClientHandler.java: ########## @@ -1246,24 +1246,24 @@ public void loadTablet(TInfo tinfo, TCredentials credentials, String lock, TabletLogger.loading(extent, server.getTabletSession()); final AssignmentHandler ah = new AssignmentHandler(server, extent); - // final Runnable ah = new LoggingRunnable(log, ); // Root tablet assignment must take place immediately if (extent.isRootTablet()) { - Threads.createThread("Root Tablet Assignment", () -> { + // TODO KEVIN RATHBUN I think this should remain non critical. This method is ultimately + // called by TabletGroupWatcher.flushChanges which is always called within a loop, so will + // continue to retry/recreate the thread + Threads.createNonCriticalThread("Root Tablet Assignment", () -> { Review Comment: Yeah this triggered by an RPC and the manager will keep making that RPC periodically. ########## server/monitor/src/main/java/org/apache/accumulo/monitor/Monitor.java: ########## @@ -514,7 +514,9 @@ public void run() { } // need to regularly fetch data so plot data is updated - Threads.createThread("Data fetcher", () -> { + // TODO KEVIN RATHBUN don't think this is a critical function of the Monitor (and the + // RuntimeException is already handled here) + Threads.createNonCriticalThread("Data fetcher", () -> { Review Comment: Even though it handles any exception, would be good to mark this as critical. If this thread stops running some data displayed on the monitor would be be frozen and would never update. ########## server/tserver/src/main/java/org/apache/accumulo/tserver/log/DfsLogger.java: ########## @@ -475,7 +475,10 @@ public synchronized void open(String address) throws IOException { throw new IOException(ex); } - syncThread = Threads.createThread("Accumulo WALog thread " + this, new LogSyncingTask()); + // TODO KEVIN RATHBUN this seems like a vital thread for TabletServer, but appears that the + // thread will continuously be recreated, so probably fine to stay non critical + syncThread = + Threads.createNonCriticalThread("Accumulo WALog thread " + this, new LogSyncingTask()); Review Comment: Yeah this should probably be critical. I suspect if it dies that writes to a tserver could freeze indefinitely. ########## core/src/main/java/org/apache/accumulo/core/util/threads/Threads.java: ########## @@ -55,22 +55,26 @@ public static Runnable createNamedRunnable(String name, Runnable r) { return new NamedRunnable(name, r); } - public static Thread createThread(String name, Runnable r) { - return createThread(name, OptionalInt.empty(), r, UEH); + public static Thread createNonCriticalThread(String name, Runnable r) { Review Comment: > My goal was to force callers to always make a conscious decision about which method to use. One possible way to achieve this is via an enum that must always be passed to the method instead of encoding the information int the method name. Feel free to ignore this comment, not sure it would be an improvement was just a thought based on the goal mentioned. ```java enum ThreadType { CRITICAL, NON_CRITICAL } public static Thread createThread(String name, ThreadType type,, Runnable r) {...} `` ########## core/src/main/java/org/apache/accumulo/core/clientImpl/ThriftTransportPool.java: ########## @@ -73,7 +73,8 @@ public class ThriftTransportPool { private ThriftTransportPool(LongSupplier maxAgeMillis) { this.maxAgeMillis = maxAgeMillis; - this.checkThread = Threads.createThread("Thrift Connection Pool Checker", () -> { + // TODO KEVIN RATHBUN all this does is perform some resource cleanup, so may not be critical. + this.checkThread = Threads.createNonCriticalThread("Thrift Connection Pool Checker", () -> { Review Comment: This thread dying could cause idle connections to accumulo server processes to be left around indefinitely which could make those server processes unhealthy. One tricky part about this code is that it runs in client and server code, for the case of client code not sure about halting the client process if this thread dies (would be a sudden change in behavior, would it be easy for someone to determine why their client process halted?). Seems like it would be good to halt an accumuo server process if this thread dies as that would help keep metadata tablet servers healthy. Seems like this is the only change in client side code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org