[ 
https://issues.apache.org/jira/browse/HBASE-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14981948#comment-14981948
 ] 

ramkrishna.s.vasudevan commented on HBASE-14495:
------------------------------------------------

{code}
015-10-30 10:38:12,446 DEBUG [FlushThread] regionserver.HRegionFileSystem(382): 
Committing store file 
C:/commit_versions/hbase/hbase-server/target/test-data/7db8e7c9-166b-45eb-b589-027946092beb/data/default/testFlushCacheWhileScanning/9e577f13d0d4caa7d069cc53cd18ae64/.tmp/7a13a0ecbc5140b0a37ea6e76055f994
 as 
C:/commit_versions/hbase/hbase-server/target/test-data/7db8e7c9-166b-45eb-b589-027946092beb/data/default/testFlushCacheWhileScanning/9e577f13d0d4caa7d069cc53cd18ae64/family/7a13a0ecbc5140b0a37ea6e76055f994
2015-10-30 10:38:12,451 INFO  [FlushThread] regionserver.HStore(987): Added 
C:/commit_versions/hbase/hbase-server/target/test-data/7db8e7c9-166b-45eb-b589-027946092beb/data/default/testFlushCacheWhileScanning/9e577f13d0d4caa7d069cc53cd18ae64/family/7a13a0ecbc5140b0a37ea6e76055f994,
 entries=24, sequenceid=953, filesize=5.9 K
2015-10-30 10:38:12,452 INFO  [FlushThread] regionserver.HRegion(2443): 
Finished memstore flush of ~4.13 KB/4224, currentsize=2.58 KB/2640 for region 
testFlushCacheWhileScanning,,1446181690193.9e577f13d0d4caa7d069cc53cd18ae64. in 
42ms, sequenceid=953, compaction requested=true
2015-10-30 10:38:12,454 INFO  [FlushThread] regionserver.HRegion(2146): 
Flushing 1/1 column families, memstore=2.58 KB
2015-10-30 10:38:12,454 INFO  [main] regionserver.HRegion(7836): writing data 
to region 
testFlushCacheWhileScanning,,1446181690193.9e577f13d0d4caa7d069cc53cd18ae64. 
with WAL disabled. Data may be lost in the event of a crash.
2015-10-30 10:38:12,458 DEBUG [FlushThread] hfile.HFile$WriterFactory(309): 
Unable to set drop behind on 
C:/commit_versions/hbase/hbase-server/target/test-data/7db8e7c9-166b-45eb-b589-027946092beb/data/default/testFlushCacheWhileScanning/9e577f13d0d4caa7d069cc53cd18ae64/.tmp/fb0fa5a1b14d4efa9efbf392268b3be1
2015-10-30 10:38:12,461 INFO  [FlushThread] 
regionserver.DefaultStoreFlusher(90): Flushed, sequenceid=972, memsize=2.6 K, 
hasBloomFilter=true, into tmp file 
C:/commit_versions/hbase/hbase-server/target/test-data/7db8e7c9-166b-45eb-b589-027946092beb/data/default/testFlushCacheWhileScanning/9e577f13d0d4caa7d069cc53cd18ae64/.tmp/fb0fa5a1b14d4efa9efbf392268b3be1
2015-10-30 10:38:12,491 DEBUG [FlushThread] 
regionserver.HRegionFileSystem(382): Committing store file 
C:/commit_versions/hbase/hbase-server/target/test-data/7db8e7c9-166b-45eb-b589-027946092beb/data/default/testFlushCacheWhileScanning/9e577f13d0d4caa7d069cc53cd18ae64/.tmp/fb0fa5a1b14d4efa9efbf392268b3be1
 as 
C:/commit_versions/hbase/hbase-server/target/test-data/7db8e7c9-166b-45eb-b589-027946092beb/data/default/testFlushCacheWhileScanning/9e577f13d0d4caa7d069cc53cd18ae64/family/fb0fa5a1b14d4efa9efbf392268b3be1
2015-10-30 10:38:12,569 ERROR [FlushThread] 
regionserver.HStore$StoreFlusherImpl(2147): Failed to commit store file 
C:/commit_versions/hbase/hbase-server/target/test-data/7db8e7c9-166b-45eb-b589-027946092beb/data/default/testFlushCacheWhileScanning/9e577f13d0d4caa7d069cc53cd18ae64/.tmp/fb0fa5a1b14d4efa9efbf392268b3be1
org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile 
Trailer from file 
file:/C:/commit_versions/hbase/hbase-server/target/test-data/7db8e7c9-166b-45eb-b589-027946092beb/data/default/testFlushCacheWhileScanning/9e577f13d0d4caa7d069cc53cd18ae64/family/fb0fa5a1b14d4efa9efbf392268b3be1
        at 
org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:517)
        at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:546)
        at 
org.apache.hadoop.hbase.regionserver.StoreFile$Reader.<init>(StoreFile.java:1140)
        at 
org.apache.hadoop.hbase.regionserver.StoreFileInfo.open(StoreFileInfo.java:265)
        at 
org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:404)
        at 
org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:515)
        at 
org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:505)
        at 
org.apache.hadoop.hbase.regionserver.HStore.createStoreFileAndReader(HStore.java:672)
        at 
org.apache.hadoop.hbase.regionserver.HStore.createStoreFileAndReader(HStore.java:664)
        at 
org.apache.hadoop.hbase.regionserver.HStore.commitFile(HStore.java:980)
        at 
org.apache.hadoop.hbase.regionserver.HStore.access$200(HStore.java:119)
        at 
org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.commit(HStore.java:2145)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2352)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2078)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2040)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1931)
        at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1857)
        at 
org.apache.hadoop.hbase.regionserver.TestHRegion$FlushThread.run(TestHRegion.java:3797)
Caused by: java.nio.channels.ClosedByInterruptException
        at java.nio.channels.spi.AbstractInterruptibleChannel.end(Unknown 
Source)
        at sun.nio.ch.FileChannelImpl.position(Unknown Source)
        at 
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.seek(RawLocalFileSystem.java:115)
        at 
org.apache.hadoop.fs.BufferedFSInputStream.seek(BufferedFSInputStream.java:96)
        at 
org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:62)
        at 
org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:62)
        at 
org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream.seek(ChecksumFileSystem.java:325)
        at 
org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:392)
        at 
org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:501)
        ... 17 more
2015-10-30 10:38:12,573 DEBUG [FlushThread] wal.WALKey(102): 
mvcc=MultiVersionConcurrencyControl{readPoint=1052, writePoint=1052}, 
writeEntry=null
2015-10-30 10:38:12,573 WARN  [FlushThread] regionserver.HRegion(2390): 
9e577f13d0d4caa7d069cc53cd18ae64 : failed writing ABORT_FLUSH marker to WAL
java.io.InterruptedIOException
        at org.apache.hadoop.hbase.wal.WALKey.getWriteEntry(WALKey.java:108)
        at 
org.apache.hadoop.hbase.regionserver.wal.WALUtil.writeMarker(WALUtil.java:131)
        at 
org.apache.hadoop.hbase.regionserver.wal.WALUtil.writeFlushMarker(WALUtil.java:75)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2387)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2078)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2040)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1931)
        at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1857)
        at 
org.apache.hadoop.hbase.regionserver.TestHRegion$FlushThread.run(TestHRegion.java:3797)
Caused by: java.lang.InterruptedException
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(Unknown
 Source)
        at java.util.concurrent.CountDownLatch.await(Unknown Source)
        at org.apache.hadoop.hbase.wal.WALKey.getWriteEntry(WALKey.java:98)
        ... 8 more
Already started clsoing
2015-10-30 10:38:12,576 DEBUG [main] regionserver.HRegion(1384): Closing 
testFlushCacheWhileScanning,,1446181690193.9e577f13d0d4caa7d069cc53cd18ae64.: 
disabling compactions & flushes
2015-10-30 10:38:12,578 DEBUG [main] regionserver.HRegion(1411): Updates 
disabled for region 
testFlushCacheWhileScanning,,1446181690193.9e577f13d0d4caa7d069cc53cd18ae64.
2015-10-30 10:38:12,578 INFO  [main] regionserver.HRegion(2146): Flushing 1/1 
column families, memstore=16.16 KB
2015-10-30 10:38:12,579 WARN  [main] regionserver.DefaultMemStore(152): 
Snapshot called again without clearing previous. Doing nothing. Another ongoing 
flush or did we fail last attempt?
2015-10-30 10:38:13,609 WARN  [main] 
regionserver.MultiVersionConcurrencyControl(214): STUCK: 
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:14,633 WARN  [main] 
regionserver.MultiVersionConcurrencyControl(214): STUCK: 
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:15,667 WARN  [main] 
regionserver.MultiVersionConcurrencyControl(214): STUCK: 
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:16,700 WARN  [main] 
regionserver.MultiVersionConcurrencyControl(214): STUCK: 
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:17,730 WARN  [main] 
regionserver.MultiVersionConcurrencyControl(214): STUCK: 
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:18,763 WARN  [main] 
regionserver.MultiVersionConcurrencyControl(214): STUCK: 
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:19,788 WARN  [main] 
regionserver.MultiVersionConcurrencyControl(214): STUCK: 
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:20,817 WARN  [main] 
regionserver.MultiVersionConcurrencyControl(214): STUCK: 
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:21,844 WARN  [main] 
regionserver.MultiVersionConcurrencyControl(214): STUCK: 
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:22,874 WARN  [main] 
regionserver.MultiVersionConcurrencyControl(214): STUCK: 
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:23,905 WARN  [main] 
regionserver.MultiVersionConcurrencyControl(214): STUCK: 
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:24,929 WARN  [main] 
regionserver.MultiVersionConcurrencyControl(214): STUCK: 
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:25,959 WARN  [main] 
regionserver.MultiVersionConcurrencyControl(214): STUCK: 
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:26,987 WARN  [main] 
regionserver.MultiVersionConcurrencyControl(214): STUCK: 
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:28,019 WARN  [main] 
regionserver.MultiVersionConcurrencyControl(214): STUCK: 
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
{code}

I don't have a dump but the logs that shows the issue. 

> TestHRegion#testFlushCacheWhileScanning goes zombie
> ---------------------------------------------------
>
>                 Key: HBASE-14495
>                 URL: https://issues.apache.org/jira/browse/HBASE-14495
>             Project: HBase
>          Issue Type: Sub-task
>          Components: test
>            Reporter: stack
>            Assignee: stack
>             Fix For: 2.0.0
>
>         Attachments: 14495.txt, 14495.txt, 14495v3.txt, 14495v6.txt, 
> 14495v7.txt, 14495v9.txt
>
>
> This test goes zombie on us, most recently, here: 
> https://builds.apache.org/job/PreCommit-HBASE-Build/15744//console
> It does not fail on my internal rig runs nor locally on laptop when run in a 
> loop.
> Its hung up in close of the region:
> {code}
> "main" prio=10 tid=0x00007fc49800a800 nid=0x6053 in Object.wait() 
> [0x00007fc4a02c9000]
>    java.lang.Thread.State: WAITING (on object monitor)
>       at java.lang.Object.wait(Native Method)
>       - waiting on <0x00000007d07c3478> (a java.lang.Object)
>       at 
> org.apache.hadoop.hbase.regionserver.MultiVersionConcurrencyControl.waitForRead(MultiVersionConcurrencyControl.java:207)
>       - locked <0x00000007d07c3478> (a java.lang.Object)
>       at 
> org.apache.hadoop.hbase.regionserver.MultiVersionConcurrencyControl.completeAndWait(MultiVersionConcurrencyControl.java:143)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2257)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2061)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2026)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2016)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1423)
>       at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344)
>       - locked <0x00000007d07c34a8> (a java.lang.Object)
>       at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1295)
>       at 
> org.apache.hadoop.hbase.HBaseTestingUtility.closeRegionAndWAL(HBaseTestingUtility.java:352)
>       at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.testFlushCacheWhileScanning(TestHRegion.java:3756)
> {code}
> It is waiting on mvcc to catch up.
> There is this comment at the point where we are hung:
>             // TODO: Lets see if we hang here, if there is a scenario where 
> an outstanding reader
>             // with a read point is in advance of this write point.
>             mvcc.completeAndWait(writeEntry);
> The above came in with HBASE-12751. The comment was added at v29:
> https://issues.apache.org/jira/secure/attachment/12754775/12751.rebased.v29.txt
> Looks like I added it so must have had predilection that this might be 
> dodgy... Let me take a look... 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to