dlmarion commented on code in PR #3677:
URL: https://github.com/apache/accumulo/pull/3677#discussion_r1289984526


##########
server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java:
##########
@@ -901,6 +902,39 @@ public void close(boolean saveState) throws IOException {
   void initiateClose(boolean saveState) {
     log.trace("initiateClose(saveState={}) {}", saveState, getExtent());
 
+    // Check to see if last or current minc is failing. If so, then throw
+    // an exception before closing the compactable and leaving the tablet
+    // in a half-closed state. Don't throw IllegalStateException because
+    // calling code will just continue to retry.
+    if (saveState) {
+      if (!isLastMinCSuccessful()) {
+        if (isMinorCompactionRunning()) {
+          // Then current minor compaction is retrying and failure is being
+          // reported.
+          String msg = "Aborting close on " + extent
+              + " because last minor compaction was not successful. Check the 
server log.";
+          log.warn(msg);
+          throw new RuntimeException(msg);
+        } else {
+          // We don't know when the last minc occurred. Kick one off now. It's 
possible
+          // that a minc will not be initiated (if there is no data in the 
table for example)
+          if (initiateMinorCompaction(MinorCompactionReason.CLOSE)) {
+            getTabletMemory().waitForMinC();
+          }
+        }
+      }
+      // We don't want to initiate the close process on the tablet if the last 
minor compaction
+      // failed. Let the user resolve whatever the issue may be so that we get 
a successful
+      // minor compaction on the tablet before closing it.
+      if (!isLastMinCSuccessful()) {
+        String msg = "Aborting close on " + extent
+            + " because last minor compaction was not successful. Check the 
server log.";
+        log.warn(msg);
+        throw new RuntimeException(msg);
+      }
+
+    }
+

Review Comment:
   > If I understand correctly, the observed issue was fixed by catching 
Exception instead of IOException in MinorCompaction task. If that is correct 
then I think it may be best to remove this code for the following reasons.
   
   I believe that is true, I think that solved this specific issue. The code I 
added in `Tablet.initiateClose` was to try and prevent the tablet from getting 
into a bad state for some other unknown failure scenario. My concern is that a 
user could miss the fact that minor compactions are failing. Then, either due 
to a call to `split`, Tablet migration, or TabletServer close the Tablet gets 
put into a half-closed state because `compactable.close` is called. The only 
remedy for this situation currently is a hard shutdown of the TabletServer. The 
user does not have the option of fixing the issue and getting a successful 
minor compaction. Depending on how long this has been occurring, the recovery 
time could be lengthy. It this issue affects multiple TabletServers, then you 
could be looking at an entire system restart with a long recovery period.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to