[ 
https://issues.apache.org/jira/browse/HBASE-10452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13890844#comment-13890844
 ] 

Ding Yuan commented on HBASE-10452:
-----------------------------------

Thanks Ram! Attaching a patch against trunk. I did not fix "case 7" since it 
requires using HBCK to clear the node. I have little expertise in HBase code 
base and any of my attempt in this case is likely to do more harm than good. 
Again, any comment is much appreciated!

> Potential bugs in exception handlers
> ------------------------------------
>
>                 Key: HBASE-10452
>                 URL: https://issues.apache.org/jira/browse/HBASE-10452
>             Project: HBase
>          Issue Type: Bug
>          Components: Client, master, regionserver, util
>    Affects Versions: 0.96.1
>            Reporter: Ding Yuan
>         Attachments: HBase-10452-trunk.patch
>
>
> Hi HBase developers,
> We are a group of researchers on software reliability. Recently we did a 
> study and found that majority of the most severe failures in HBase are caused 
> by bugs in exception handling logic -- that it is hard to anticipate all the 
> possible real-world error scenarios. Therefore we built a simple checking 
> tool that automatically detects some bug patterns that have caused some very 
> severe real-world failures. I am reporting some of the results here. Any 
> feedback is much appreciated!
> Ding
> =========================
> Case 1:
>   Line: 134, File: 
> "org/apache/hadoop/hbase/regionserver/RegionMergeRequest.java"
> {noformat}
>   protected void releaseTableLock() {
>     if (this.tableLock != null) {
>       try {
>         this.tableLock.release();
>       } catch (IOException ex) {
>         LOG.warn("Could not release the table lock", ex);
>         //TODO: if we get here, and not abort RS, this lock will never be 
> released
>       }
>     }
> {noformat}
> The lock is not released if the exception occurs, causing potential deadlock 
> or starvation.
> Similar code pattern can be found at:
>   Line: 135, File: "org/apache/hadoop/hbase/regionserver/SplitRequest.java"
> ==========================================
> =========================
> Case 2:
>   Line: 252, File: 
> "org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java"
> {noformat}
>     try {
>       Field fEnd = SequenceFile.Reader.class.getDeclaredField("end");
>       fEnd.setAccessible(true);
>       end = fEnd.getLong(this.reader);
>     } catch(Exception e) { /* reflection fail. keep going */ }
> {noformat}
> The caught Exception seems to be too general.
> While reflection-related errors might be harmless, the try block can throw
> other exceptions including "SecurityException", "IllegalAccessException", 
> etc. Currently
> all those exceptions are ignored. Maybe
> the safe way is to ignore the specific reflection-related errors while 
> logging and
> handling other types of unexpected exceptions.
> ==========================================
> =========================
> Case 3:
>   Line: 148, File: "org/apache/hadoop/hbase/HBaseConfiguration.java"
> {noformat}
>     try {
>       if (Class.forName("org.apache.hadoop.conf.ConfServlet") != null) {
>         isShowConf = true;
>       }
>     } catch (Exception e) {
>     }
> {noformat}
> Similar to the previous case, the exception handling is too general. While 
> ClassNotFound error might be the normal case and ignored, Class.forName can 
> also throw other exceptions (e.g., LinkageError) under some unexpected and 
> rare error cases. If that happens, the error will be lost. So maybe change it 
> to below:
> {noformat}
>     try {
>       if (Class.forName("org.apache.hadoop.conf.ConfServlet") != null) {
>         isShowConf = true;
>       }
>     } catch (LinkageError e) {
>       LOG.warn("..");
>       // handle linkage error
>     } catch (ExceptionInInitializerError e) {
>       LOG.warn("..");
>       // handle Initializer error
>     } catch (ClassNotFoundException e) {
>      LOG.debug("..");
>      // ignore
>     }
> {noformat}
> ==========================================
> =========================
> Case 4:
>   Line: 163, File: "org/apache/hadoop/hbase/client/Get.java"
> {noformat}
>   public Get setTimeStamp(long timestamp) {
>     try {
>       tr = new TimeRange(timestamp, timestamp+1);
>     } catch(IOException e) {
>       // Will never happen
>     }
>     return this;
>   }
> {noformat}
> Even if the IOException never happens right now, is it possible to happen in 
> the future due to code change?
> At least there should be a log message. The current behavior is dangerous 
> since if the exception ever happens
> in any unexpected scenario, it will be silently swallowed.
> Similar code pattern can be found at:
>   Line: 300, File: "org/apache/hadoop/hbase/client/Scan.java"
> ==========================================
> =========================
> Case 5:
>   Line: 207, File: "org/apache/hadoop/hbase/util/JVM.java"
> {noformat}
>    if (input != null){
>         try {
>           input.close();
>         } catch (IOException ignored) {
>         }
>       }
> {noformat}
> Any exception encountered in close is completely ignored, not even logged.
> In particular, the same exception scenario was handled differently in other 
> methods in the same file:
> Line: 154, same file
> {noformat}
>        if (in != null){
>          try {
>            in.close();
>          } catch (IOException e) {
>            LOG.warn("Not able to close the InputStream", e);
>          }
>        }
> {noformat}
> Line: 248, same file
> {noformat}
>       if (in != null){
>         try {
>           in.close();
>         } catch (IOException e) {
>           LOG.warn("Not able to close the InputStream", e);
>         }
>       }
> {noformat}
> ==========================================
> =========================
> Case 6: empty handler for exception: java.io.IOException
>   Line: 312, File: "org/apache/hadoop/hbase/rest/RowResource.java"
> {noformat}
>     } finally {
>       if (table != null) try {
>         table.close();
>       } catch (IOException ioe) { }
>     }
> {noformat}
> IOException is completely ignored. This behavior is inconsistent with the same
> code snippet at line 249 in the same file, where the IOExceptions was logged:
> {noformat}
>    } finally {
>       if (table != null) try {
>         table.close();
>       } catch (IOException ioe) {
>         LOG.debug("Exception received while closing the table", ioe);
>       }
>     }
> {noformat}
> ==========================================
> =========================
> Case 7:
>   Line: 95, File: 
> "org/apache/hadoop/hbase/master/handler/EnableTableHandler.java"
> {noformat}
>         try {
>           this.assignmentManager.getZKTable().removeEnablingTable(tableName, 
> true);
>         } catch (KeeperException e) {
>           // TODO : Use HBCK to clear such nodes
>           LOG.warn("Failed to delete the ENABLING node for the table " + 
> tableName
>               + ".  The table will remain unusable. Run HBCK to manually fix 
> the problem.");
>         }
> {noformat}
> The log message in the exception handler and the comment seem to suggest that 
> such nodes should be cleared using HBCK.
> ==========================================
> =========================
> Case 8:
>   Line: 463, File: "org/apache/hadoop/hbase/client/ClientScanner.java"
> {noformat}
>         try {
>           this.caller.callWithRetries(callable);
>         } catch (IOException e) {
>           // We used to catch this error, interpret, and rethrow. However, we
>           // have since decided that it's not nice for a scanner's close to
>           // throw exceptions. Chances are it was just an UnknownScanner
>           // exception due to lease time out.
>         }
> {noformat}
> Currently the handler is empty because it may be caused by "just an 
> UnknownScanner exception".
> But what if it has other causes? Maybe the catch block can differentiate the 
> exception into
> different causes, ignoring the ones caused by UnknownScanner while handle 
> others differently?
> ==========================================



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to