[ 
https://issues.apache.org/jira/browse/HBASE-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-14495:
--------------------------
    Attachment: 14495v7.txt

TestBulkLoad was failing because it had a mocked WAL so the WALKey latches were 
never getting closed. This test is only test in our suite to use jmock ... so 
jmock machinations to get trigger of the latch to happen.




    hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
    Writing markers, pass mvcc so can complete mvcc transaction – leave no 
holes in mvcc writes
    If exception appending an empty WAL edit, be sure to complete mvcc so no 
holes in mvcc.

    
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConcurrencyControl.java
    If mvcc is hung, log that we are STUCK. Added debug.
    
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
    When we write empty edits – just to get a sequenceid – rather than return 
early, go
    through all but the actual wal append... we used to short circuit out (not 
a problem,
    but could be if something gets added later)

    
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALUtil.java
    Refactor so same thing happens around all marker appends. Adds a completion 
even
    on exception of mvcc operation.

    hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALKey.java
    The actual fix.. if interrupted, complete the mvcc operation so we don't 
leave holes
    in mvcc.

    
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
    Add debug... name threads, add logging.

    
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestBulkLoad.java
    This is a good one. The WAL is mocked so the WALEdit latch is never closed.
    Had to 'learn' jmock -- this is only test that uses it -- and then add an
    'Action' just to trip the latch.

> TestHRegion#testFlushCacheWhileScanning goes zombie
> ---------------------------------------------------
>
>                 Key: HBASE-14495
>                 URL: https://issues.apache.org/jira/browse/HBASE-14495
>             Project: HBase
>          Issue Type: Sub-task
>          Components: test
>            Reporter: stack
>            Assignee: stack
>         Attachments: 14495.txt, 14495.txt, 14495v3.txt, 14495v6.txt, 
> 14495v7.txt
>
>
> This test goes zombie on us, most recently, here: 
> https://builds.apache.org/job/PreCommit-HBASE-Build/15744//console
> It does not fail on my internal rig runs nor locally on laptop when run in a 
> loop.
> Its hung up in close of the region:
> {code}
> "main" prio=10 tid=0x00007fc49800a800 nid=0x6053 in Object.wait() 
> [0x00007fc4a02c9000]
>    java.lang.Thread.State: WAITING (on object monitor)
>       at java.lang.Object.wait(Native Method)
>       - waiting on <0x00000007d07c3478> (a java.lang.Object)
>       at 
> org.apache.hadoop.hbase.regionserver.MultiVersionConcurrencyControl.waitForRead(MultiVersionConcurrencyControl.java:207)
>       - locked <0x00000007d07c3478> (a java.lang.Object)
>       at 
> org.apache.hadoop.hbase.regionserver.MultiVersionConcurrencyControl.completeAndWait(MultiVersionConcurrencyControl.java:143)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2257)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2061)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2026)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2016)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1423)
>       at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344)
>       - locked <0x00000007d07c34a8> (a java.lang.Object)
>       at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1295)
>       at 
> org.apache.hadoop.hbase.HBaseTestingUtility.closeRegionAndWAL(HBaseTestingUtility.java:352)
>       at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.testFlushCacheWhileScanning(TestHRegion.java:3756)
> {code}
> It is waiting on mvcc to catch up.
> There is this comment at the point where we are hung:
>             // TODO: Lets see if we hang here, if there is a scenario where 
> an outstanding reader
>             // with a read point is in advance of this write point.
>             mvcc.completeAndWait(writeEntry);
> The above came in with HBASE-12751. The comment was added at v29:
> https://issues.apache.org/jira/secure/attachment/12754775/12751.rebased.v29.txt
> Looks like I added it so must have had predilection that this might be 
> dodgy... Let me take a look... 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to