dlmarion commented on code in PR #5576: URL: https://github.com/apache/accumulo/pull/5576#discussion_r2105167839
########## core/src/main/java/org/apache/accumulo/core/util/threads/Threads.java: ########## @@ -55,22 +55,26 @@ public static Runnable createNamedRunnable(String name, Runnable r) { return new NamedRunnable(name, r); } - public static Thread createThread(String name, Runnable r) { - return createThread(name, OptionalInt.empty(), r, UEH); + public static Thread createNonCriticalThread(String name, Runnable r) { Review Comment: I'm not sure we need to rename these to be NonCritical, I think the fact that we have a createCriticalThread implies that the others are non-critical. ########## server/manager/src/main/java/org/apache/accumulo/manager/Manager.java: ########## @@ -1265,14 +1265,21 @@ public void run() { context.getTableManager().addObserver(this); - Thread statusThread = Threads.createThread("Status Thread", new StatusThread()); + // TODO KEVIN RATHBUN updating the Manager state seems like a critical function. However, the + // thread already handles, waits, and continues in the case of any Exception, so critical or + // non critical doesn't make a difference here. + Thread statusThread = Threads.createCriticalThread("Status Thread", new StatusThread()); statusThread.start(); - Threads.createThread("Migration Cleanup Thread", new MigrationCleanupThread()).start(); + // TODO KEVIN RATHBUN migration cleanup may be a critical function of the manager, but the + // thread will already handle, wait, and continue in the case of any Exception, so critical + // or non critical doesn't make a difference here. + Threads.createCriticalThread("Migration Cleanup Thread", new MigrationCleanupThread()).start(); tserverSet.startListeningForTabletServerChanges(); - Threads.createThread("ScanServer Cleanup Thread", new ScanServerZKCleaner()).start(); + // TODO KEVIN RATHBUN Some ZK cleanup doesn't seem like a critical function of manager + Threads.createNonCriticalThread("ScanServer Cleanup Thread", new ScanServerZKCleaner()).start(); Review Comment: I'm thinking we may want to make this critical. The clients find ScanServers by looking them up in ZooKeeper. Leaving orphaned entries over a long period of time could progressively slow the clients down. I'm not sure why the thread might die, as all of the known exceptions are handled, but RuntimeException is not caught by the thread, so some unchecked exception could cause it to fail. ########## server/tserver/src/main/java/org/apache/accumulo/tserver/log/DfsLogger.java: ########## @@ -475,7 +475,10 @@ public synchronized void open(String address) throws IOException { throw new IOException(ex); } - syncThread = Threads.createThread("Accumulo WALog thread " + this, new LogSyncingTask()); + // TODO KEVIN RATHBUN this seems like a vital thread for TabletServer, but appears that the + // thread will continuously be recreated, so probably fine to stay non critical + syncThread = + Threads.createNonCriticalThread("Accumulo WALog thread " + this, new LogSyncingTask()); Review Comment: The syncThread is created for each DFSLogger, it looks like there is a 1:1 relationship. The syncThread runs a LogSyncingTask which flushes the write ahead log according to the users configured durability intention. When the write-ahead logs are full, the TabletServer will create new ones and therefore new sync threads. I don't think a sync thread for a specific DFSLogger is recreated, which means that it's possible it could die. I think this might be critical. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org