[
https://issues.apache.org/jira/browse/HBASE-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
stack updated HBASE-14495:
--------------------------
Attachment: 14495v7.txt
TestBulkLoad was failing because it had a mocked WAL so the WALKey latches were
never getting closed. This test is only test in our suite to use jmock ... so
jmock machinations to get trigger of the latch to happen.
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
Writing markers, pass mvcc so can complete mvcc transaction – leave no
holes in mvcc writes
If exception appending an empty WAL edit, be sure to complete mvcc so no
holes in mvcc.
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConcurrencyControl.java
If mvcc is hung, log that we are STUCK. Added debug.
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
When we write empty edits – just to get a sequenceid – rather than return
early, go
through all but the actual wal append... we used to short circuit out (not
a problem,
but could be if something gets added later)
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALUtil.java
Refactor so same thing happens around all marker appends. Adds a completion
even
on exception of mvcc operation.
hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALKey.java
The actual fix.. if interrupted, complete the mvcc operation so we don't
leave holes
in mvcc.
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
Add debug... name threads, add logging.
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestBulkLoad.java
This is a good one. The WAL is mocked so the WALEdit latch is never closed.
Had to 'learn' jmock -- this is only test that uses it -- and then add an
'Action' just to trip the latch.
> TestHRegion#testFlushCacheWhileScanning goes zombie
> ---------------------------------------------------
>
> Key: HBASE-14495
> URL: https://issues.apache.org/jira/browse/HBASE-14495
> Project: HBase
> Issue Type: Sub-task
> Components: test
> Reporter: stack
> Assignee: stack
> Attachments: 14495.txt, 14495.txt, 14495v3.txt, 14495v6.txt,
> 14495v7.txt
>
>
> This test goes zombie on us, most recently, here:
> https://builds.apache.org/job/PreCommit-HBASE-Build/15744//console
> It does not fail on my internal rig runs nor locally on laptop when run in a
> loop.
> Its hung up in close of the region:
> {code}
> "main" prio=10 tid=0x00007fc49800a800 nid=0x6053 in Object.wait()
> [0x00007fc4a02c9000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x00000007d07c3478> (a java.lang.Object)
> at
> org.apache.hadoop.hbase.regionserver.MultiVersionConcurrencyControl.waitForRead(MultiVersionConcurrencyControl.java:207)
> - locked <0x00000007d07c3478> (a java.lang.Object)
> at
> org.apache.hadoop.hbase.regionserver.MultiVersionConcurrencyControl.completeAndWait(MultiVersionConcurrencyControl.java:143)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2257)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2061)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2026)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2016)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1423)
> at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344)
> - locked <0x00000007d07c34a8> (a java.lang.Object)
> at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1295)
> at
> org.apache.hadoop.hbase.HBaseTestingUtility.closeRegionAndWAL(HBaseTestingUtility.java:352)
> at
> org.apache.hadoop.hbase.regionserver.TestHRegion.testFlushCacheWhileScanning(TestHRegion.java:3756)
> {code}
> It is waiting on mvcc to catch up.
> There is this comment at the point where we are hung:
> // TODO: Lets see if we hang here, if there is a scenario where
> an outstanding reader
> // with a read point is in advance of this write point.
> mvcc.completeAndWait(writeEntry);
> The above came in with HBASE-12751. The comment was added at v29:
> https://issues.apache.org/jira/secure/attachment/12754775/12751.rebased.v29.txt
> Looks like I added it so must have had predilection that this might be
> dodgy... Let me take a look...
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)