[
https://issues.apache.org/jira/browse/HBASE-10452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ted Yu updated HBASE-10452:
---------------------------
Summary: Fix potential bugs in exception handlers (was: Potential bugs in
exception handlers)
> Fix potential bugs in exception handlers
> ----------------------------------------
>
> Key: HBASE-10452
> URL: https://issues.apache.org/jira/browse/HBASE-10452
> Project: HBase
> Issue Type: Bug
> Components: Client, master, regionserver, util
> Affects Versions: 0.96.1
> Reporter: Ding Yuan
> Attachments: HBase-10452-trunk-v2.patch, HBase-10452-trunk-v3.patch,
> HBase-10452-trunk.patch
>
>
> Hi HBase developers,
> We are a group of researchers on software reliability. Recently we did a
> study and found that majority of the most severe failures in HBase are caused
> by bugs in exception handling logic -- that it is hard to anticipate all the
> possible real-world error scenarios. Therefore we built a simple checking
> tool that automatically detects some bug patterns that have caused some very
> severe real-world failures. I am reporting some of the results here. Any
> feedback is much appreciated!
> Ding
> =========================
> Case 1:
> Line: 134, File:
> "org/apache/hadoop/hbase/regionserver/RegionMergeRequest.java"
> {noformat}
> protected void releaseTableLock() {
> if (this.tableLock != null) {
> try {
> this.tableLock.release();
> } catch (IOException ex) {
> LOG.warn("Could not release the table lock", ex);
> //TODO: if we get here, and not abort RS, this lock will never be
> released
> }
> }
> {noformat}
> The lock is not released if the exception occurs, causing potential deadlock
> or starvation.
> Similar code pattern can be found at:
> Line: 135, File: "org/apache/hadoop/hbase/regionserver/SplitRequest.java"
> ==========================================
> =========================
> Case 2:
> Line: 252, File:
> "org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java"
> {noformat}
> try {
> Field fEnd = SequenceFile.Reader.class.getDeclaredField("end");
> fEnd.setAccessible(true);
> end = fEnd.getLong(this.reader);
> } catch(Exception e) { /* reflection fail. keep going */ }
> {noformat}
> The caught Exception seems to be too general.
> While reflection-related errors might be harmless, the try block can throw
> other exceptions including "SecurityException", "IllegalAccessException",
> etc. Currently
> all those exceptions are ignored. Maybe
> the safe way is to ignore the specific reflection-related errors while
> logging and
> handling other types of unexpected exceptions.
> ==========================================
> =========================
> Case 3:
> Line: 148, File: "org/apache/hadoop/hbase/HBaseConfiguration.java"
> {noformat}
> try {
> if (Class.forName("org.apache.hadoop.conf.ConfServlet") != null) {
> isShowConf = true;
> }
> } catch (Exception e) {
> }
> {noformat}
> Similar to the previous case, the exception handling is too general. While
> ClassNotFound error might be the normal case and ignored, Class.forName can
> also throw other exceptions (e.g., LinkageError) under some unexpected and
> rare error cases. If that happens, the error will be lost. So maybe change it
> to below:
> {noformat}
> try {
> if (Class.forName("org.apache.hadoop.conf.ConfServlet") != null) {
> isShowConf = true;
> }
> } catch (LinkageError e) {
> LOG.warn("..");
> // handle linkage error
> } catch (ExceptionInInitializerError e) {
> LOG.warn("..");
> // handle Initializer error
> } catch (ClassNotFoundException e) {
> LOG.debug("..");
> // ignore
> }
> {noformat}
> ==========================================
> =========================
> Case 4:
> Line: 163, File: "org/apache/hadoop/hbase/client/Get.java"
> {noformat}
> public Get setTimeStamp(long timestamp) {
> try {
> tr = new TimeRange(timestamp, timestamp+1);
> } catch(IOException e) {
> // Will never happen
> }
> return this;
> }
> {noformat}
> Even if the IOException never happens right now, is it possible to happen in
> the future due to code change?
> At least there should be a log message. The current behavior is dangerous
> since if the exception ever happens
> in any unexpected scenario, it will be silently swallowed.
> Similar code pattern can be found at:
> Line: 300, File: "org/apache/hadoop/hbase/client/Scan.java"
> ==========================================
> =========================
> Case 5:
> Line: 207, File: "org/apache/hadoop/hbase/util/JVM.java"
> {noformat}
> if (input != null){
> try {
> input.close();
> } catch (IOException ignored) {
> }
> }
> {noformat}
> Any exception encountered in close is completely ignored, not even logged.
> In particular, the same exception scenario was handled differently in other
> methods in the same file:
> Line: 154, same file
> {noformat}
> if (in != null){
> try {
> in.close();
> } catch (IOException e) {
> LOG.warn("Not able to close the InputStream", e);
> }
> }
> {noformat}
> Line: 248, same file
> {noformat}
> if (in != null){
> try {
> in.close();
> } catch (IOException e) {
> LOG.warn("Not able to close the InputStream", e);
> }
> }
> {noformat}
> ==========================================
> =========================
> Case 6: empty handler for exception: java.io.IOException
> Line: 312, File: "org/apache/hadoop/hbase/rest/RowResource.java"
> {noformat}
> } finally {
> if (table != null) try {
> table.close();
> } catch (IOException ioe) { }
> }
> {noformat}
> IOException is completely ignored. This behavior is inconsistent with the same
> code snippet at line 249 in the same file, where the IOExceptions was logged:
> {noformat}
> } finally {
> if (table != null) try {
> table.close();
> } catch (IOException ioe) {
> LOG.debug("Exception received while closing the table", ioe);
> }
> }
> {noformat}
> ==========================================
> =========================
> Case 7:
> Line: 95, File:
> "org/apache/hadoop/hbase/master/handler/EnableTableHandler.java"
> {noformat}
> try {
> this.assignmentManager.getZKTable().removeEnablingTable(tableName,
> true);
> } catch (KeeperException e) {
> // TODO : Use HBCK to clear such nodes
> LOG.warn("Failed to delete the ENABLING node for the table " +
> tableName
> + ". The table will remain unusable. Run HBCK to manually fix
> the problem.");
> }
> {noformat}
> The log message in the exception handler and the comment seem to suggest that
> such nodes should be cleared using HBCK.
> ==========================================
> =========================
> Case 8:
> Line: 463, File: "org/apache/hadoop/hbase/client/ClientScanner.java"
> {noformat}
> try {
> this.caller.callWithRetries(callable);
> } catch (IOException e) {
> // We used to catch this error, interpret, and rethrow. However, we
> // have since decided that it's not nice for a scanner's close to
> // throw exceptions. Chances are it was just an UnknownScanner
> // exception due to lease time out.
> }
> {noformat}
> Currently the handler is empty because it may be caused by "just an
> UnknownScanner exception".
> But what if it has other causes? Maybe the catch block can differentiate the
> exception into
> different causes, ignoring the ones caused by UnknownScanner while handle
> others differently?
> ==========================================
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)