[ 
https://issues.apache.org/jira/browse/HBASE-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-14495:
--------------------------
    Attachment: 14495v9.txt

v9

        HBASE-14495 TestHRegion#testFlushCacheWhileScanning goes zombie

        
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          Writing markers, pass mvcc so can complete mvcc transaction -- leave 
no holes in mvcc writes
          If exception appending an empty WAL edit, be sure to complete mvcc so 
no holes in mvcc.

        
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConcurrencyControl.java
          If mvcc is hung, log that we are STUCK. Added debug.

        
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
          When we write empty edits -- just to get a sequenceid -- rather than 
return early, go
          through all but the actual wal append... we used to short circuit out 
(not a problem,
          but could be if something gets added later)

        
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALUtil.java
          Refactor so same thing happens around all marker appends. Adds a 
completion even
          on exception of mvcc operation.

        hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALKey.java
          The actual fix.. if interrupted, complete the mvcc operation so we 
don't leave holes
          in mvcc.

        
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
          Add debug... name threads, add logging.

Disabled TestDistributedLogSplitting#testWorkerAbort because flakey (flakey 
locally as well as up on jenkins)



> TestHRegion#testFlushCacheWhileScanning goes zombie
> ---------------------------------------------------
>
>                 Key: HBASE-14495
>                 URL: https://issues.apache.org/jira/browse/HBASE-14495
>             Project: HBase
>          Issue Type: Sub-task
>          Components: test
>            Reporter: stack
>            Assignee: stack
>         Attachments: 14495.txt, 14495.txt, 14495v3.txt, 14495v6.txt, 
> 14495v7.txt, 14495v9.txt
>
>
> This test goes zombie on us, most recently, here: 
> https://builds.apache.org/job/PreCommit-HBASE-Build/15744//console
> It does not fail on my internal rig runs nor locally on laptop when run in a 
> loop.
> Its hung up in close of the region:
> {code}
> "main" prio=10 tid=0x00007fc49800a800 nid=0x6053 in Object.wait() 
> [0x00007fc4a02c9000]
>    java.lang.Thread.State: WAITING (on object monitor)
>       at java.lang.Object.wait(Native Method)
>       - waiting on <0x00000007d07c3478> (a java.lang.Object)
>       at 
> org.apache.hadoop.hbase.regionserver.MultiVersionConcurrencyControl.waitForRead(MultiVersionConcurrencyControl.java:207)
>       - locked <0x00000007d07c3478> (a java.lang.Object)
>       at 
> org.apache.hadoop.hbase.regionserver.MultiVersionConcurrencyControl.completeAndWait(MultiVersionConcurrencyControl.java:143)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2257)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2061)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2026)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2016)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1423)
>       at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344)
>       - locked <0x00000007d07c34a8> (a java.lang.Object)
>       at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1295)
>       at 
> org.apache.hadoop.hbase.HBaseTestingUtility.closeRegionAndWAL(HBaseTestingUtility.java:352)
>       at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.testFlushCacheWhileScanning(TestHRegion.java:3756)
> {code}
> It is waiting on mvcc to catch up.
> There is this comment at the point where we are hung:
>             // TODO: Lets see if we hang here, if there is a scenario where 
> an outstanding reader
>             // with a read point is in advance of this write point.
>             mvcc.completeAndWait(writeEntry);
> The above came in with HBASE-12751. The comment was added at v29:
> https://issues.apache.org/jira/secure/attachment/12754775/12751.rebased.v29.txt
> Looks like I added it so must have had predilection that this might be 
> dodgy... Let me take a look... 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to