[jira] [Created] (HBASE-14368) New TestWALLockup broken by addendum added to parent issue

stack (JIRA) Fri, 04 Sep 2015 15:43:02 -0700

stack created HBASE-14368:
-----------------------------

             Summary: New TestWALLockup broken by addendum added to parent issue
                 Key: HBASE-14368
                 URL: https://issues.apache.org/jira/browse/HBASE-14368
             Project: HBase
          Issue Type: Sub-task
          Components: test
            Reporter: stack
            Assignee: stack



My second addendum broke TestWALLockup, the one that did this: 
https://issues.apache.org/jira/browse/HBASE-14317?focusedCommentId=14730301&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14730301

{code}
diff --git 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
index 5708c30..c421f5c 100644
--- 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
+++ 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
@@ -878,8 +878,19 @@ public class FSHLog implements WAL {
         // Let the writer thread go regardless, whether error or not.
         if (zigzagLatch != null) {
           zigzagLatch.releaseSafePoint();
-          // It will be null if we failed our wait on safe point above.
-          if (syncFuture != null) blockOnSync(syncFuture);
+          // syncFuture will be null if we failed our wait on safe point 
above. Otherwise, if
+          // latch was obtained successfully, the sync we threw in either 
trigger the latch or it
+          // got stamped with an exception because the WAL was damaged and we 
could not sync. Now
+          // the write pipeline has been opened up again by releasing the safe 
point, process the
+          // syncFuture we got above. This is probably a noop but it may be 
stale exception from
+          // when old WAL was in place. Catch it if so.
+          if (syncFuture != null) {
+            try {
+              blockOnSync(syncFuture);
+            } catch (IOException ioe) {
+              if (LOG.isTraceEnabled()) LOG.trace("Stale sync exception", ioe);
+            }
+          }
{code}

It broke the test because the test hand feeds appends and syncs with when they 
should throw exceptions. In the test we manufactured the case where an append 
fails and we then asserted the following sync would fail.

Problem was that we expected the failure to be a dropped snapshot failure 
because fail of sync is a catastrophic event... but our hand feeding actually 
reproduced the case where a sync goes into the damaged file... before it had 
rolled... which is no longer a catastrophic event... we just catch and move on.

The attached patch just removes check for dropped snapshot and that abort was 
called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-14368) New TestWALLockup broken by addendum added to parent issue

Reply via email to