dlmarion commented on code in PR #3677:
URL: https://github.com/apache/accumulo/pull/3677#discussion_r1289956468
##########
server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java:
##########
@@ -901,6 +902,39 @@ public void close(boolean saveState) throws IOException {
void initiateClose(boolean saveState) {
log.trace("initiateClose(saveState={}) {}", saveState, getExtent());
+ // Check to see if last or current minc is failing. If so, then throw
+ // an exception before closing the compactable and leaving the tablet
+ // in a half-closed state. Don't throw IllegalStateException because
+ // calling code will just continue to retry.
+ if (saveState) {
+ if (!isLastMinCSuccessful()) {
+ if (isMinorCompactionRunning()) {
+ // Then current minor compaction is retrying and failure is being
+ // reported.
+ String msg = "Aborting close on " + extent
+ + " because last minor compaction was not successful. Check the
server log.";
+ log.warn(msg);
+ throw new RuntimeException(msg);
+ } else {
+ // We don't know when the last minc occurred. Kick one off now. It's
possible
+ // that a minc will not be initiated (if there is no data in the
table for example)
+ if (initiateMinorCompaction(MinorCompactionReason.CLOSE)) {
+ getTabletMemory().waitForMinC();
+ }
+ }
+ }
+ // We don't want to initiate the close process on the tablet if the last
minor compaction
+ // failed. Let the user resolve whatever the issue may be so that we get
a successful
+ // minor compaction on the tablet before closing it.
+ if (!isLastMinCSuccessful()) {
+ String msg = "Aborting close on " + extent
+ + " because last minor compaction was not successful. Check the
server log.";
+ log.warn(msg);
+ throw new RuntimeException(msg);
+ }
+
+ }
+
Review Comment:
> If we find more places where the minor compaction code could have retried
and it did not, then I think we should modify that minor compaction code to
make it retry there.
If we do that, and remove this check or something like it, then until we
found and fixed all of the places where this happens Tablets could still get
into a bad state where the only remedy is a restart of all TabletServers.
> This code has some race conditions, for example a minor compaction could
fail in another thread right after the check in this function. So the checks
here do not completely solve the problem.
Agreed, it's best effort. Maybe there is a way to fix the race condition and
prevent another minc from starting?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]