keith-turner commented on code in PR #5193: URL: https://github.com/apache/accumulo/pull/5193#discussion_r1918982579
########## server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServer.java: ########## @@ -926,26 +930,70 @@ public void run() { } } - // wait for shutdown - // if the main thread exits oldServer the manager listener, the JVM will - // kill the other threads and finalize objects. We want the shutdown that is - // running in the manager listener thread to complete oldServer this happens. - // consider making other threads daemon threads so that objects don't - // get prematurely finalized - synchronized (this) { - while (!shutdownComplete) { - try { - this.wait(1000); - } catch (InterruptedException e) { - log.error(e.toString()); - } - } - } log.debug("Stopping Replication Server"); if (this.replServer != null) { this.replServer.stop(); } + // Best-effort attempt at unloading tablets. + log.debug("Unloading tablets"); + final List<Future<?>> futures = new ArrayList<>(); + final ThreadPoolExecutor tpe = getContext().threadPools() + .getPoolBuilder(ThreadPoolNames.TSERVER_SHUTDOWN_UNLOAD_TABLET_POOL).numCoreThreads(8) + .numMaxThreads(16).build(); + + ManagerClientService.Client iface = managerConnection(getManagerAddress()); + boolean managerDown = false; + + try { + for (DataLevel level : new DataLevel[] {DataLevel.USER, DataLevel.METADATA, DataLevel.ROOT}) { Review Comment: While the tablet server is working through this the manager may look at the tablet server and see it has less tablets than other and decided to assign new tablets to it. Was wondering if we could set the tsever to ignore new assignments before starting this unload processing, however even if we do that the manager will set a future location and tablets w/ a future location still go through log recovery. The longer this unload processing takes the more likely the manager will assign tablets. If tablets are assigned then that increases the chance of log recovery still happening which is something this change s trying to avoid. The most likely suspect for causing this unload processing to take a long time would be minor compactions. One strategy to deal with this is to do as much work as possible w/o making the manager aware that anything is happening. The Unload handler does two things it closes the tablet which includes minor compactions and then it update the tablets location in the metadata table. In order to minimize the time the manager has to react to unassingments could do something like the following. 1. Go through all tablets in a level and close them but leave their location as is. Closing the tablet object will also cause it stop accepting new writes. 2. Go through all the tablets that were closed in the prev step and update their location This avoids updating any locations until all tablets are closed which gives the manager a smaller time window to notice this and start assigning stuff to the tablet server. If this time window is normally really small then the manager will usually never react to it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org