[ 
https://issues.apache.org/jira/browse/HBASE-14368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023719#comment-15023719
 ] 

stack commented on HBASE-14368:
-------------------------------

[~enis] Let me know. Am a little wary around this area of the code these times 
(backporting it... saw hang in a 1.0+ version of hbase).

> New TestWALLockup broken by addendum added to parent issue
> ----------------------------------------------------------
>
>                 Key: HBASE-14368
>                 URL: https://issues.apache.org/jira/browse/HBASE-14368
>             Project: HBase
>          Issue Type: Sub-task
>          Components: test
>            Reporter: stack
>            Assignee: stack
>             Fix For: 2.0.0
>
>         Attachments: 14368.txt, 14368.txt
>
>
> My second addendum broke TestWALLockup, the one that did this: 
> https://issues.apache.org/jira/browse/HBASE-14317?focusedCommentId=14730301&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14730301
> {code}
> diff --git 
> a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
>  
> b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
> index 5708c30..c421f5c 100644
> --- 
> a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
> +++ 
> b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
> @@ -878,8 +878,19 @@ public class FSHLog implements WAL {
>          // Let the writer thread go regardless, whether error or not.
>          if (zigzagLatch != null) {
>            zigzagLatch.releaseSafePoint();
> -          // It will be null if we failed our wait on safe point above.
> -          if (syncFuture != null) blockOnSync(syncFuture);
> +          // syncFuture will be null if we failed our wait on safe point 
> above. Otherwise, if
> +          // latch was obtained successfully, the sync we threw in either 
> trigger the latch or it
> +          // got stamped with an exception because the WAL was damaged and 
> we could not sync. Now
> +          // the write pipeline has been opened up again by releasing the 
> safe point, process the
> +          // syncFuture we got above. This is probably a noop but it may be 
> stale exception from
> +          // when old WAL was in place. Catch it if so.
> +          if (syncFuture != null) {
> +            try {
> +              blockOnSync(syncFuture);
> +            } catch (IOException ioe) {
> +              if (LOG.isTraceEnabled()) LOG.trace("Stale sync exception", 
> ioe);
> +            }
> +          }
> {code}
> It broke the test because the test hand feeds appends and syncs with when 
> they should throw exceptions. In the test we manufactured the case where an 
> append fails and we then asserted the following sync would fail.
> Problem was that we expected the failure to be a dropped snapshot failure 
> because fail of sync is a catastrophic event... but our hand feeding actually 
> reproduced the case where a sync goes into the damaged file... before it had 
> rolled... which is no longer a catastrophic event... we just catch and move 
> on.
> The attached patch just removes check for dropped snapshot and that abort was 
> called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to