dlmarion commented on code in PR #3677:
URL: https://github.com/apache/accumulo/pull/3677#discussion_r1289984526
##########
server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java:
##########
@@ -901,6 +902,39 @@ public void close(boolean saveState) throws IOException {
void initiateClose(boolean saveState) {
log.trace("initiateClose(saveState={}) {}", saveState, getExtent());
+ // Check to see if last or current minc is failing. If so, then throw
+ // an exception before closing the compactable and leaving the tablet
+ // in a half-closed state. Don't throw IllegalStateException because
+ // calling code will just continue to retry.
+ if (saveState) {
+ if (!isLastMinCSuccessful()) {
+ if (isMinorCompactionRunning()) {
+ // Then current minor compaction is retrying and failure is being
+ // reported.
+ String msg = "Aborting close on " + extent
+ + " because last minor compaction was not successful. Check the
server log.";
+ log.warn(msg);
+ throw new RuntimeException(msg);
+ } else {
+ // We don't know when the last minc occurred. Kick one off now. It's
possible
+ // that a minc will not be initiated (if there is no data in the
table for example)
+ if (initiateMinorCompaction(MinorCompactionReason.CLOSE)) {
+ getTabletMemory().waitForMinC();
+ }
+ }
+ }
+ // We don't want to initiate the close process on the tablet if the last
minor compaction
+ // failed. Let the user resolve whatever the issue may be so that we get
a successful
+ // minor compaction on the tablet before closing it.
+ if (!isLastMinCSuccessful()) {
+ String msg = "Aborting close on " + extent
+ + " because last minor compaction was not successful. Check the
server log.";
+ log.warn(msg);
+ throw new RuntimeException(msg);
+ }
+
+ }
+
Review Comment:
> If I understand correctly, the observed issue was fixed by catching
Exception instead of IOException in MinorCompaction task. If that is correct
then I think it may be best to remove this code for the following reasons.
I believe that is true, I think that solved this specific issue. The code I
added in `Tablet.initiateClose` was to try and prevent the tablet from getting
into a bad state for some other unknown failure scenario. My concern is that a
user could miss the fact that minor compactions are failing. Then, either due
to a call to `split`, Tablet migration, or TabletServer close the Tablet gets
put into a half-closed state because `compactable.close` is called. The only
remedy for this situation currently is a hard shutdown of the TabletServer. The
user does not have the option of fixing the issue and getting a successful
minor compaction. Depending on how long this has been occurring, the recovery
time could be lengthy. It this issue affects multiple TabletServers, then you
could be looking at an entire system restart with a long recovery period.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]