[
https://issues.apache.org/jira/browse/HBASE-14368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023719#comment-15023719
]
stack commented on HBASE-14368:
-------------------------------
[~enis] Let me know. Am a little wary around this area of the code these times
(backporting it... saw hang in a 1.0+ version of hbase).
> New TestWALLockup broken by addendum added to parent issue
> ----------------------------------------------------------
>
> Key: HBASE-14368
> URL: https://issues.apache.org/jira/browse/HBASE-14368
> Project: HBase
> Issue Type: Sub-task
> Components: test
> Reporter: stack
> Assignee: stack
> Fix For: 2.0.0
>
> Attachments: 14368.txt, 14368.txt
>
>
> My second addendum broke TestWALLockup, the one that did this:
> https://issues.apache.org/jira/browse/HBASE-14317?focusedCommentId=14730301&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14730301
> {code}
> diff --git
> a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
>
> b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
> index 5708c30..c421f5c 100644
> ---
> a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
> +++
> b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
> @@ -878,8 +878,19 @@ public class FSHLog implements WAL {
> // Let the writer thread go regardless, whether error or not.
> if (zigzagLatch != null) {
> zigzagLatch.releaseSafePoint();
> - // It will be null if we failed our wait on safe point above.
> - if (syncFuture != null) blockOnSync(syncFuture);
> + // syncFuture will be null if we failed our wait on safe point
> above. Otherwise, if
> + // latch was obtained successfully, the sync we threw in either
> trigger the latch or it
> + // got stamped with an exception because the WAL was damaged and
> we could not sync. Now
> + // the write pipeline has been opened up again by releasing the
> safe point, process the
> + // syncFuture we got above. This is probably a noop but it may be
> stale exception from
> + // when old WAL was in place. Catch it if so.
> + if (syncFuture != null) {
> + try {
> + blockOnSync(syncFuture);
> + } catch (IOException ioe) {
> + if (LOG.isTraceEnabled()) LOG.trace("Stale sync exception",
> ioe);
> + }
> + }
> {code}
> It broke the test because the test hand feeds appends and syncs with when
> they should throw exceptions. In the test we manufactured the case where an
> append fails and we then asserted the following sync would fail.
> Problem was that we expected the failure to be a dropped snapshot failure
> because fail of sync is a catastrophic event... but our hand feeding actually
> reproduced the case where a sync goes into the damaged file... before it had
> rolled... which is no longer a catastrophic event... we just catch and move
> on.
> The attached patch just removes check for dropped snapshot and that abort was
> called.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)