[
https://issues.apache.org/jira/browse/HBASE-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
stack updated HBASE-14495:
--------------------------
Release Note: The WAL append was changed by HBASE-12751. Every append now
sets a latch on an edit. The latch needs to be cleared or else the WAL will
hang. The original failures in TestHRegion turned up 'holes' where we were
failing to throw the latch if we skipped out early because we were interrupted.
Other 'holes' were found where we had mocked up a WAL so the latch would just
stay in place. Futher holes were found appending WAL markers... here we were
skipping the mvcc completely for a few edits. A clean up of WALUtils made all
markers take the same code paths.
> TestHRegion#testFlushCacheWhileScanning goes zombie
> ---------------------------------------------------
>
> Key: HBASE-14495
> URL: https://issues.apache.org/jira/browse/HBASE-14495
> Project: HBase
> Issue Type: Sub-task
> Components: test
> Reporter: stack
> Assignee: stack
> Attachments: 14495.txt, 14495.txt, 14495v3.txt, 14495v6.txt,
> 14495v7.txt, 14495v9.txt
>
>
> This test goes zombie on us, most recently, here:
> https://builds.apache.org/job/PreCommit-HBASE-Build/15744//console
> It does not fail on my internal rig runs nor locally on laptop when run in a
> loop.
> Its hung up in close of the region:
> {code}
> "main" prio=10 tid=0x00007fc49800a800 nid=0x6053 in Object.wait()
> [0x00007fc4a02c9000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x00000007d07c3478> (a java.lang.Object)
> at
> org.apache.hadoop.hbase.regionserver.MultiVersionConcurrencyControl.waitForRead(MultiVersionConcurrencyControl.java:207)
> - locked <0x00000007d07c3478> (a java.lang.Object)
> at
> org.apache.hadoop.hbase.regionserver.MultiVersionConcurrencyControl.completeAndWait(MultiVersionConcurrencyControl.java:143)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2257)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2061)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2026)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2016)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1423)
> at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344)
> - locked <0x00000007d07c34a8> (a java.lang.Object)
> at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1295)
> at
> org.apache.hadoop.hbase.HBaseTestingUtility.closeRegionAndWAL(HBaseTestingUtility.java:352)
> at
> org.apache.hadoop.hbase.regionserver.TestHRegion.testFlushCacheWhileScanning(TestHRegion.java:3756)
> {code}
> It is waiting on mvcc to catch up.
> There is this comment at the point where we are hung:
> // TODO: Lets see if we hang here, if there is a scenario where
> an outstanding reader
> // with a read point is in advance of this write point.
> mvcc.completeAndWait(writeEntry);
> The above came in with HBASE-12751. The comment was added at v29:
> https://issues.apache.org/jira/secure/attachment/12754775/12751.rebased.v29.txt
> Looks like I added it so must have had predilection that this might be
> dodgy... Let me take a look...
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)