dlmarion commented on code in PR #3677:
URL: https://github.com/apache/accumulo/pull/3677#discussion_r1289956468


##########
server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java:
##########
@@ -901,6 +902,39 @@ public void close(boolean saveState) throws IOException {
   void initiateClose(boolean saveState) {
     log.trace("initiateClose(saveState={}) {}", saveState, getExtent());
 
+    // Check to see if last or current minc is failing. If so, then throw
+    // an exception before closing the compactable and leaving the tablet
+    // in a half-closed state. Don't throw IllegalStateException because
+    // calling code will just continue to retry.
+    if (saveState) {
+      if (!isLastMinCSuccessful()) {
+        if (isMinorCompactionRunning()) {
+          // Then current minor compaction is retrying and failure is being
+          // reported.
+          String msg = "Aborting close on " + extent
+              + " because last minor compaction was not successful. Check the 
server log.";
+          log.warn(msg);
+          throw new RuntimeException(msg);
+        } else {
+          // We don't know when the last minc occurred. Kick one off now. It's 
possible
+          // that a minc will not be initiated (if there is no data in the 
table for example)
+          if (initiateMinorCompaction(MinorCompactionReason.CLOSE)) {
+            getTabletMemory().waitForMinC();
+          }
+        }
+      }
+      // We don't want to initiate the close process on the tablet if the last 
minor compaction
+      // failed. Let the user resolve whatever the issue may be so that we get 
a successful
+      // minor compaction on the tablet before closing it.
+      if (!isLastMinCSuccessful()) {
+        String msg = "Aborting close on " + extent
+            + " because last minor compaction was not successful. Check the 
server log.";
+        log.warn(msg);
+        throw new RuntimeException(msg);
+      }
+
+    }
+

Review Comment:
   > If we find more places where the minor compaction code could have retried 
and it did not, then I think we should modify that minor compaction code to 
make it retry there.
   
   If we do that, and remove this check or something like it, then until we 
found and fixed all of the places where this happens Tablets could still get 
into a bad state where the only remedy is a restart of all TabletServers.
   
   > This code has some race conditions, for example a minor compaction could 
fail in another thread right after the check in this function. So the checks 
here do not completely solve the problem.
   
   Agreed, it's best effort. Maybe there is a way to fix the race condition and 
prevent another minc from starting? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to