ndimiduk commented on a change in pull request #1311: HBASE-23984 [Flakey
Tests] TestMasterAbortAndRSGotKilled fails in tea…
URL: https://github.com/apache/hbase/pull/1311#discussion_r395316718
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRegionHandler.java
##########
@@ -106,20 +105,11 @@ public void process() {
}
// Close the region
- try {
- if (region.close(abort) == null) {
- // This region got closed. Most likely due to a split.
- // The split message will clean up the master state.
- LOG.warn("Can't close region {}, was already closed during close()",
name);
- return;
- }
- } catch (IOException ioe) {
- // An IOException here indicates that we couldn't successfully flush
the
- // memstore before closing. So, we need to abort the server and allow
- // the master to split our logs in order to recover the data.
- server.abort("Unrecoverable exception while closing region " +
- regionInfo.getRegionNameAsString() + ", still finishing close", ioe);
- throw new RuntimeException(ioe);
Review comment:
After reading the above comment and seeing you discarded the throwing of
this exception, I initially choked. But reading through the actual use of these
`Handlers` in the `ExecutorService` instance hanging off of `HRegionServer`,
and `HMaster` I can only conclude that the above throw was only wishful
thinking. There's even a
[comment](https://github.com/apache/hbase/blob/678b142da2b75ab6697125bbfdd33e32851650bf/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L1381-L1387)
(emphasis mine):
> Start up all services. **If any of these threads gets an unhandled
exception**
> **then they just die with a logged message.** This should be fine because
> in general, we do not expect the master to get such unhandled exceptions
> as OOMEs; it should be lightly loaded. See what HRegionServer does if
> need to install an unexpected exception handler.
The author of the above comment speaks wistfully of what i can only assume
is
[`HRegionServer#uncaughtExceptionHandler`](https://github.com/apache/hbase/blob/8e26761fd01408471a25d481afed64faf34f7574/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L625-L626).
However, it doesn't appear that this is threaded down into the executor
service, which means this line's `throw` statement is simply logged and ignored.
So yes, I think removing the `throw` is the right choice. It removes the
false sense of handling this error condition correctly. It's really the `abort`
that protects the content of the memstore.
Also, why is there not a named exception thrown by the memstore when it
cannot flush? Seems like a useful point in that data structure's API.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services