keith-turner commented on code in PR #5576:
URL: https://github.com/apache/accumulo/pull/5576#discussion_r2109473622


##########
server/manager/src/main/java/org/apache/accumulo/manager/Manager.java:
##########
@@ -1265,14 +1265,21 @@ public void run() {
 
     context.getTableManager().addObserver(this);
 
-    Thread statusThread = Threads.createThread("Status Thread", new 
StatusThread());
+    // TODO KEVIN RATHBUN updating the Manager state seems like a critical 
function. However, the
+    // thread already handles, waits, and continues in the case of any 
Exception, so critical or
+    // non critical doesn't make a difference here.
+    Thread statusThread = Threads.createCriticalThread("Status Thread", new 
StatusThread());

Review Comment:
   > so critical or non critical doesn't make a difference here
   
   still seems nice to make this change though, makes the code more resilient 
to changes in the function.  Like if a bit of code was added outside the try 
block in the method and it threw a runtime exception.



##########
server/manager/src/main/java/org/apache/accumulo/manager/Manager.java:
##########
@@ -1616,14 +1625,16 @@ private TServer setupReplication()
         Property.MANAGER_REPLICATION_COORDINATOR_THREADCHECK, 
maxMessageSizeProperty);
 
     log.info("Started replication coordinator service at " + 
replAddress.address);
+    // TODO KEVIN RATHBUN this thread creation exists within a task which is 
labeled non-critical
+    // so assuming these are as well.
     // Start the daemon to scan the replication table and make units of work
-    replicationWorkThread = Threads.createThread("Replication Driver",
+    replicationWorkThread = Threads.createNonCriticalThread("Replication 
Driver",
         new org.apache.accumulo.manager.replication.ReplicationDriver(this));
     replicationWorkThread.start();
 
     // Start the daemon to assign work to tservers to replicate to our peers
     var wd = new org.apache.accumulo.manager.replication.WorkDriver(this);
-    replicationAssignerThread = Threads.createThread(wd.getName(), wd);
+    replicationAssignerThread = Threads.createNonCriticalThread(wd.getName(), 
wd);

Review Comment:
   > within a task which is labeled non-critical
   
   Where is it labeled non-critical?



##########
server/tserver/src/main/java/org/apache/accumulo/tserver/TabletClientHandler.java:
##########
@@ -1246,24 +1246,24 @@ public void loadTablet(TInfo tinfo, TCredentials 
credentials, String lock,
     TabletLogger.loading(extent, server.getTabletSession());
 
     final AssignmentHandler ah = new AssignmentHandler(server, extent);
-    // final Runnable ah = new LoggingRunnable(log, );
     // Root tablet assignment must take place immediately
 
     if (extent.isRootTablet()) {
-      Threads.createThread("Root Tablet Assignment", () -> {
+      // TODO KEVIN RATHBUN I think this should remain non critical. This 
method is ultimately
+      // called by TabletGroupWatcher.flushChanges which is always called 
within a loop, so will
+      // continue to retry/recreate the thread
+      Threads.createNonCriticalThread("Root Tablet Assignment", () -> {

Review Comment:
   Yeah this triggered by an RPC and the manager will keep making that RPC 
periodically.



##########
server/monitor/src/main/java/org/apache/accumulo/monitor/Monitor.java:
##########
@@ -514,7 +514,9 @@ public void run() {
     }
 
     // need to regularly fetch data so plot data is updated
-    Threads.createThread("Data fetcher", () -> {
+    // TODO KEVIN RATHBUN don't think this is a critical function of the 
Monitor (and the
+    // RuntimeException is already handled here)
+    Threads.createNonCriticalThread("Data fetcher", () -> {

Review Comment:
   Even though it handles any exception, would be good to mark this as 
critical.  If this thread stops running some data displayed on the monitor 
would be be frozen and would never update.



##########
server/tserver/src/main/java/org/apache/accumulo/tserver/log/DfsLogger.java:
##########
@@ -475,7 +475,10 @@ public synchronized void open(String address) throws 
IOException {
       throw new IOException(ex);
     }
 
-    syncThread = Threads.createThread("Accumulo WALog thread " + this, new 
LogSyncingTask());
+    // TODO KEVIN RATHBUN this seems like a vital thread for TabletServer, but 
appears that the
+    // thread will continuously be recreated, so probably fine to stay non 
critical
+    syncThread =
+        Threads.createNonCriticalThread("Accumulo WALog thread " + this, new 
LogSyncingTask());

Review Comment:
   Yeah this should probably be critical.  I suspect if it dies that writes to 
a tserver could freeze indefinitely. 



##########
core/src/main/java/org/apache/accumulo/core/util/threads/Threads.java:
##########
@@ -55,22 +55,26 @@ public static Runnable createNamedRunnable(String name, 
Runnable r) {
     return new NamedRunnable(name, r);
   }
 
-  public static Thread createThread(String name, Runnable r) {
-    return createThread(name, OptionalInt.empty(), r, UEH);
+  public static Thread createNonCriticalThread(String name, Runnable r) {

Review Comment:
   > My goal was to force callers to always make a conscious decision about 
which method to use.
   
   One possible way to achieve this is via an enum that must always be passed 
to the method instead of encoding the information int the method name. Feel 
free to ignore this comment, not sure it would be an improvement was just a 
thought based on the goal mentioned.
   
   
   ```java
      enum ThreadType {
         CRITICAL,
         NON_CRITICAL
      }
   
      public static Thread createThread(String name, ThreadType type,, Runnable 
r) {...}
   
   ``



##########
core/src/main/java/org/apache/accumulo/core/clientImpl/ThriftTransportPool.java:
##########
@@ -73,7 +73,8 @@ public class ThriftTransportPool {
 
   private ThriftTransportPool(LongSupplier maxAgeMillis) {
     this.maxAgeMillis = maxAgeMillis;
-    this.checkThread = Threads.createThread("Thrift Connection Pool Checker", 
() -> {
+    // TODO KEVIN RATHBUN all this does is perform some resource cleanup, so 
may not be critical.
+    this.checkThread = Threads.createNonCriticalThread("Thrift Connection Pool 
Checker", () -> {

Review Comment:
   This thread dying could cause idle connections to accumulo server processes 
to be left around indefinitely which could make those server processes 
unhealthy.  One tricky part about this code is that it runs in client and 
server code, for the case of client code not sure about halting the client 
process if this thread dies (would be a sudden change in behavior, would it be 
easy for someone to determine why their client process halted?).  Seems like it 
would be good to halt an accumuo server process if this thread dies as that 
would help keep metadata tablet servers healthy.
   
   Seems like this is the only change in client side code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to