virajjasani commented on a change in pull request #2890: URL: https://github.com/apache/hbase/pull/2890#discussion_r559461474
########## File path: hbase-common/src/main/java/org/apache/hadoop/hbase/ChoreService.java ########## @@ -319,12 +360,18 @@ public synchronized void onChoreMissedStartTime(ScheduledChore chore) { * shutdown the service. Any chores that are scheduled for execution will be cancelled. Any chores * in the middle of execution will be interrupted and shutdown. This service will be unusable * after this method has been called (i.e. future scheduling attempts will fail). + * <p/> + * Notice that, this will only clean the chore from this ChoreService but you could still schedule + * the chore with other ChoreService. Review comment: nit: other or same ChreService instance? ########## File path: hbase-common/src/main/java/org/apache/hadoop/hbase/ChoreService.java ########## @@ -141,28 +140,41 @@ public ChoreService(final String coreThreadPoolPrefix, int corePoolSize, boolean * @return true when the chore was successfully scheduled. false when the scheduling failed * (typically occurs when a chore is scheduled during shutdown of service) */ - public synchronized boolean scheduleChore(ScheduledChore chore) { + public boolean scheduleChore(ScheduledChore chore) { if (chore == null) { return false; } - - try { - if (chore.getPeriod() <= 0) { - LOG.info("Chore {} is disabled because its period is not positive.", chore); - return false; - } - LOG.info("Chore {} is enabled.", chore); - chore.setChoreServicer(this); - ScheduledFuture<?> future = - scheduler.scheduleAtFixedRate(chore, chore.getInitialDelay(), chore.getPeriod(), - chore.getTimeUnit()); - scheduledChores.put(chore, future); - return true; - } catch (Exception exception) { - if (LOG.isInfoEnabled()) { - LOG.info("Could not successfully schedule chore: " + chore.getName()); + // always lock chore first to prevent dead lock + synchronized (chore) { Review comment: Other than here, do we have synchronization on `ScheduledChore` elsewhere? I think all other methods themselves are synchronized which is good but how do we prevent anyone from synchronizing on `ScheduledChore` instance in future? Perhaps for now, we can write a Javadoc comment on `ScheduledChore` providing this info? ########## File path: hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java ########## @@ -936,7 +936,7 @@ public void startChore() { */ public void stop() { if (flushedSeqIdFlusher != null) { - flushedSeqIdFlusher.cancel(); + flushedSeqIdFlusher.shutdown(); Review comment: `shutdown(true)` looks better candidate ########## File path: hbase-server/src/main/java/org/apache/hadoop/hbase/quotas/RegionServerSpaceQuotaManager.java ########## @@ -98,11 +98,11 @@ public synchronized void start() throws IOException { public synchronized void stop() { if (spaceQuotaRefresher != null) { - spaceQuotaRefresher.cancel(); Review comment: `cancel()` is nothing but `cancel(true)`, so all `cancel()` should be replaced with `shutdown(true)` just like all `cancel(true)` are replaced with `shutdown(true)` ########## File path: hbase-common/src/main/java/org/apache/hadoop/hbase/ChoreService.java ########## @@ -141,28 +140,41 @@ public ChoreService(final String coreThreadPoolPrefix, int corePoolSize, boolean * @return true when the chore was successfully scheduled. false when the scheduling failed * (typically occurs when a chore is scheduled during shutdown of service) */ - public synchronized boolean scheduleChore(ScheduledChore chore) { + public boolean scheduleChore(ScheduledChore chore) { if (chore == null) { return false; } - - try { - if (chore.getPeriod() <= 0) { - LOG.info("Chore {} is disabled because its period is not positive.", chore); - return false; - } - LOG.info("Chore {} is enabled.", chore); - chore.setChoreServicer(this); - ScheduledFuture<?> future = - scheduler.scheduleAtFixedRate(chore, chore.getInitialDelay(), chore.getPeriod(), - chore.getTimeUnit()); - scheduledChores.put(chore, future); - return true; - } catch (Exception exception) { - if (LOG.isInfoEnabled()) { - LOG.info("Could not successfully schedule chore: " + chore.getName()); + // always lock chore first to prevent dead lock + synchronized (chore) { + synchronized (this) { + try { + // Chores should only ever be scheduled with a single ChoreService. If the choreService + // is changing, cancel any existing schedules of this chore. + if (chore.getChoreService() == this) { + LOG.warn("Chore {} has already been scheduled with us", chore); + return false; + } + if (chore.getPeriod() <= 0) { + LOG.info("Chore {} is disabled because its period is not positive.", chore); + return false; + } + LOG.info("Chore {} is enabled.", chore); + if (chore.getChoreService() != null) { + LOG.info("Cancel chore {} from its previous service", chore); + chore.getChoreService().cancelChore(chore); + } + chore.setChoreService(this); + ScheduledFuture<?> future = scheduler.scheduleAtFixedRate(chore, chore.getInitialDelay(), + chore.getPeriod(), chore.getTimeUnit()); + scheduledChores.put(chore, future); + return true; + } catch (Exception exception) { + if (LOG.isInfoEnabled()) { + LOG.info("Could not successfully schedule chore: " + chore.getName()); Review comment: Let's convert this to ERROR with exception stacktrace? ########## File path: hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java ########## @@ -1500,11 +1499,9 @@ private void switchSnapshotCleanup(final boolean on) { try { snapshotCleanupTracker.setSnapshotCleanupEnabled(on); if (on) { - if (!getChoreService().isChoreScheduled(this.snapshotCleanerChore)) { - getChoreService().scheduleChore(this.snapshotCleanerChore); - } + getChoreService().scheduleChore(this.snapshotCleanerChore); } else { - getChoreService().cancelChore(this.snapshotCleanerChore); + this.snapshotCleanerChore.cancel(); Review comment: I know it's a pain but we need to have null check here ########## File path: hbase-common/src/main/java/org/apache/hadoop/hbase/ChoreService.java ########## @@ -141,28 +140,41 @@ public ChoreService(final String coreThreadPoolPrefix, int corePoolSize, boolean * @return true when the chore was successfully scheduled. false when the scheduling failed * (typically occurs when a chore is scheduled during shutdown of service) */ - public synchronized boolean scheduleChore(ScheduledChore chore) { + public boolean scheduleChore(ScheduledChore chore) { if (chore == null) { return false; } - - try { - if (chore.getPeriod() <= 0) { - LOG.info("Chore {} is disabled because its period is not positive.", chore); - return false; - } - LOG.info("Chore {} is enabled.", chore); - chore.setChoreServicer(this); - ScheduledFuture<?> future = - scheduler.scheduleAtFixedRate(chore, chore.getInitialDelay(), chore.getPeriod(), - chore.getTimeUnit()); - scheduledChores.put(chore, future); - return true; - } catch (Exception exception) { - if (LOG.isInfoEnabled()) { - LOG.info("Could not successfully schedule chore: " + chore.getName()); + // always lock chore first to prevent dead lock + synchronized (chore) { + synchronized (this) { + try { + // Chores should only ever be scheduled with a single ChoreService. If the choreService + // is changing, cancel any existing schedules of this chore. + if (chore.getChoreService() == this) { + LOG.warn("Chore {} has already been scheduled with us", chore); + return false; + } + if (chore.getPeriod() <= 0) { + LOG.info("Chore {} is disabled because its period is not positive.", chore); + return false; + } + LOG.info("Chore {} is enabled.", chore); + if (chore.getChoreService() != null) { + LOG.info("Cancel chore {} from its previous service", chore); + chore.getChoreService().cancelChore(chore); Review comment: I think it makes sense. So what we are doing is we are taking lock on give `ScheduledChore` instance, followed by current `ChoreService` itself and when we find that this ScheduledChore belongs to different ChoreService instance, we call `cancel()` on that instance, which is anyways synchronized instance method. ########## File path: hbase-common/src/main/java/org/apache/hadoop/hbase/ScheduledChore.java ########## @@ -354,7 +320,27 @@ protected boolean initialChore() { /** * Override to run cleanup tasks when the Chore encounters an error and must stop running */ - protected synchronized void cleanup() { + protected void cleanup() { + } + + /** + * Call {@link #shutdown(boolean)} with {@code true}. + * @see ScheduledChore#shutdown(boolean) + */ + public synchronized void shutdown() { + cancel(false); + cleanup(); Review comment: This is supposed to call `shutdown(true)` right? Javadoc also says so. ########## File path: hbase-common/src/main/java/org/apache/hadoop/hbase/ChoreService.java ########## @@ -319,12 +360,18 @@ public synchronized void onChoreMissedStartTime(ScheduledChore chore) { * shutdown the service. Any chores that are scheduled for execution will be cancelled. Any chores * in the middle of execution will be interrupted and shutdown. This service will be unusable * after this method has been called (i.e. future scheduling attempts will fail). + * <p/> + * Notice that, this will only clean the chore from this ChoreService but you could still schedule + * the chore with other ChoreService. */ public synchronized void shutdown() { + if (isShutdown()) { + return; + } scheduler.shutdownNow(); if (LOG.isInfoEnabled()) { - LOG.info("Chore service for: " + coreThreadPoolPrefix + " had " + scheduledChores.keySet() - + " on shutdown"); + LOG.info("Chore service for: " + coreThreadPoolPrefix + " had " + scheduledChores.keySet() + + " on shutdown"); } Review comment: I think INFO is quite common level set by majority applications. We better remove `isInfoEnabled()` check and also replace String concat with placeholders with `{}`? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org