keith-turner commented on code in PR #5193:
URL: https://github.com/apache/accumulo/pull/5193#discussion_r1918982579


##########
server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServer.java:
##########
@@ -926,26 +930,70 @@ public void run() {
       }
     }
 
-    // wait for shutdown
-    // if the main thread exits oldServer the manager listener, the JVM will
-    // kill the other threads and finalize objects. We want the shutdown that 
is
-    // running in the manager listener thread to complete oldServer this 
happens.
-    // consider making other threads daemon threads so that objects don't
-    // get prematurely finalized
-    synchronized (this) {
-      while (!shutdownComplete) {
-        try {
-          this.wait(1000);
-        } catch (InterruptedException e) {
-          log.error(e.toString());
-        }
-      }
-    }
     log.debug("Stopping Replication Server");
     if (this.replServer != null) {
       this.replServer.stop();
     }
 
+    // Best-effort attempt at unloading tablets.
+    log.debug("Unloading tablets");
+    final List<Future<?>> futures = new ArrayList<>();
+    final ThreadPoolExecutor tpe = getContext().threadPools()
+        
.getPoolBuilder(ThreadPoolNames.TSERVER_SHUTDOWN_UNLOAD_TABLET_POOL).numCoreThreads(8)
+        .numMaxThreads(16).build();
+
+    ManagerClientService.Client iface = managerConnection(getManagerAddress());
+    boolean managerDown = false;
+
+    try {
+      for (DataLevel level : new DataLevel[] {DataLevel.USER, 
DataLevel.METADATA, DataLevel.ROOT}) {

Review Comment:
   While the tablet server is working through this the manager may look at the 
tablet server and see it has less tablets than other and decided to assign new 
tablets to it.  Was wondering if we could set the tsever to ignore new 
assignments before starting this unload processing, however even if we do that 
the manager will set a future location and tablets w/ a future location still 
go through log recovery.
   
   The longer this unload processing takes the more likely the manager will 
assign tablets.  If tablets are assigned then that increases the chance of log 
recovery still happening which is something this change s trying to avoid.  The 
most likely suspect for causing this unload processing to take a long time 
would be minor compactions.
   
   One strategy to deal with this is to do as much work as possible w/o making 
the manager aware that anything is happening.  The Unload handler does two 
things it closes the tablet which includes minor compactions and then it update 
the tablets location in the metadata table.  In order to minimize the time the 
manager has to react to unassingments could do something like the following.
   
    1. Go through all tablets in a level and close them but leave their 
location as is.  Closing the tablet object will also cause it stop accepting 
new writes.
    2. Go through all the tablets that were closed in the prev step and update 
their location
   
   This avoids updating any locations until all tablets are closed which gives 
the manager a smaller time window to notice this and start assigning stuff to 
the tablet server.  If this time window is normally really small then the 
manager will usually never react to it.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to