[
https://issues.apache.org/jira/browse/HBASE-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14981948#comment-14981948
]
ramkrishna.s.vasudevan commented on HBASE-14495:
------------------------------------------------
{code}
015-10-30 10:38:12,446 DEBUG [FlushThread] regionserver.HRegionFileSystem(382):
Committing store file
C:/commit_versions/hbase/hbase-server/target/test-data/7db8e7c9-166b-45eb-b589-027946092beb/data/default/testFlushCacheWhileScanning/9e577f13d0d4caa7d069cc53cd18ae64/.tmp/7a13a0ecbc5140b0a37ea6e76055f994
as
C:/commit_versions/hbase/hbase-server/target/test-data/7db8e7c9-166b-45eb-b589-027946092beb/data/default/testFlushCacheWhileScanning/9e577f13d0d4caa7d069cc53cd18ae64/family/7a13a0ecbc5140b0a37ea6e76055f994
2015-10-30 10:38:12,451 INFO [FlushThread] regionserver.HStore(987): Added
C:/commit_versions/hbase/hbase-server/target/test-data/7db8e7c9-166b-45eb-b589-027946092beb/data/default/testFlushCacheWhileScanning/9e577f13d0d4caa7d069cc53cd18ae64/family/7a13a0ecbc5140b0a37ea6e76055f994,
entries=24, sequenceid=953, filesize=5.9 K
2015-10-30 10:38:12,452 INFO [FlushThread] regionserver.HRegion(2443):
Finished memstore flush of ~4.13 KB/4224, currentsize=2.58 KB/2640 for region
testFlushCacheWhileScanning,,1446181690193.9e577f13d0d4caa7d069cc53cd18ae64. in
42ms, sequenceid=953, compaction requested=true
2015-10-30 10:38:12,454 INFO [FlushThread] regionserver.HRegion(2146):
Flushing 1/1 column families, memstore=2.58 KB
2015-10-30 10:38:12,454 INFO [main] regionserver.HRegion(7836): writing data
to region
testFlushCacheWhileScanning,,1446181690193.9e577f13d0d4caa7d069cc53cd18ae64.
with WAL disabled. Data may be lost in the event of a crash.
2015-10-30 10:38:12,458 DEBUG [FlushThread] hfile.HFile$WriterFactory(309):
Unable to set drop behind on
C:/commit_versions/hbase/hbase-server/target/test-data/7db8e7c9-166b-45eb-b589-027946092beb/data/default/testFlushCacheWhileScanning/9e577f13d0d4caa7d069cc53cd18ae64/.tmp/fb0fa5a1b14d4efa9efbf392268b3be1
2015-10-30 10:38:12,461 INFO [FlushThread]
regionserver.DefaultStoreFlusher(90): Flushed, sequenceid=972, memsize=2.6 K,
hasBloomFilter=true, into tmp file
C:/commit_versions/hbase/hbase-server/target/test-data/7db8e7c9-166b-45eb-b589-027946092beb/data/default/testFlushCacheWhileScanning/9e577f13d0d4caa7d069cc53cd18ae64/.tmp/fb0fa5a1b14d4efa9efbf392268b3be1
2015-10-30 10:38:12,491 DEBUG [FlushThread]
regionserver.HRegionFileSystem(382): Committing store file
C:/commit_versions/hbase/hbase-server/target/test-data/7db8e7c9-166b-45eb-b589-027946092beb/data/default/testFlushCacheWhileScanning/9e577f13d0d4caa7d069cc53cd18ae64/.tmp/fb0fa5a1b14d4efa9efbf392268b3be1
as
C:/commit_versions/hbase/hbase-server/target/test-data/7db8e7c9-166b-45eb-b589-027946092beb/data/default/testFlushCacheWhileScanning/9e577f13d0d4caa7d069cc53cd18ae64/family/fb0fa5a1b14d4efa9efbf392268b3be1
2015-10-30 10:38:12,569 ERROR [FlushThread]
regionserver.HStore$StoreFlusherImpl(2147): Failed to commit store file
C:/commit_versions/hbase/hbase-server/target/test-data/7db8e7c9-166b-45eb-b589-027946092beb/data/default/testFlushCacheWhileScanning/9e577f13d0d4caa7d069cc53cd18ae64/.tmp/fb0fa5a1b14d4efa9efbf392268b3be1
org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile
Trailer from file
file:/C:/commit_versions/hbase/hbase-server/target/test-data/7db8e7c9-166b-45eb-b589-027946092beb/data/default/testFlushCacheWhileScanning/9e577f13d0d4caa7d069cc53cd18ae64/family/fb0fa5a1b14d4efa9efbf392268b3be1
at
org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:517)
at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:546)
at
org.apache.hadoop.hbase.regionserver.StoreFile$Reader.<init>(StoreFile.java:1140)
at
org.apache.hadoop.hbase.regionserver.StoreFileInfo.open(StoreFileInfo.java:265)
at
org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:404)
at
org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:515)
at
org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:505)
at
org.apache.hadoop.hbase.regionserver.HStore.createStoreFileAndReader(HStore.java:672)
at
org.apache.hadoop.hbase.regionserver.HStore.createStoreFileAndReader(HStore.java:664)
at
org.apache.hadoop.hbase.regionserver.HStore.commitFile(HStore.java:980)
at
org.apache.hadoop.hbase.regionserver.HStore.access$200(HStore.java:119)
at
org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.commit(HStore.java:2145)
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2352)
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2078)
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2040)
at
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1931)
at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1857)
at
org.apache.hadoop.hbase.regionserver.TestHRegion$FlushThread.run(TestHRegion.java:3797)
Caused by: java.nio.channels.ClosedByInterruptException
at java.nio.channels.spi.AbstractInterruptibleChannel.end(Unknown
Source)
at sun.nio.ch.FileChannelImpl.position(Unknown Source)
at
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.seek(RawLocalFileSystem.java:115)
at
org.apache.hadoop.fs.BufferedFSInputStream.seek(BufferedFSInputStream.java:96)
at
org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:62)
at
org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:62)
at
org.apache.hadoop.fs.ChecksumFileSystem$FSDataBoundedInputStream.seek(ChecksumFileSystem.java:325)
at
org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:392)
at
org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:501)
... 17 more
2015-10-30 10:38:12,573 DEBUG [FlushThread] wal.WALKey(102):
mvcc=MultiVersionConcurrencyControl{readPoint=1052, writePoint=1052},
writeEntry=null
2015-10-30 10:38:12,573 WARN [FlushThread] regionserver.HRegion(2390):
9e577f13d0d4caa7d069cc53cd18ae64 : failed writing ABORT_FLUSH marker to WAL
java.io.InterruptedIOException
at org.apache.hadoop.hbase.wal.WALKey.getWriteEntry(WALKey.java:108)
at
org.apache.hadoop.hbase.regionserver.wal.WALUtil.writeMarker(WALUtil.java:131)
at
org.apache.hadoop.hbase.regionserver.wal.WALUtil.writeFlushMarker(WALUtil.java:75)
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2387)
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2078)
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2040)
at
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1931)
at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1857)
at
org.apache.hadoop.hbase.regionserver.TestHRegion$FlushThread.run(TestHRegion.java:3797)
Caused by: java.lang.InterruptedException
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(Unknown
Source)
at java.util.concurrent.CountDownLatch.await(Unknown Source)
at org.apache.hadoop.hbase.wal.WALKey.getWriteEntry(WALKey.java:98)
... 8 more
Already started clsoing
2015-10-30 10:38:12,576 DEBUG [main] regionserver.HRegion(1384): Closing
testFlushCacheWhileScanning,,1446181690193.9e577f13d0d4caa7d069cc53cd18ae64.:
disabling compactions & flushes
2015-10-30 10:38:12,578 DEBUG [main] regionserver.HRegion(1411): Updates
disabled for region
testFlushCacheWhileScanning,,1446181690193.9e577f13d0d4caa7d069cc53cd18ae64.
2015-10-30 10:38:12,578 INFO [main] regionserver.HRegion(2146): Flushing 1/1
column families, memstore=16.16 KB
2015-10-30 10:38:12,579 WARN [main] regionserver.DefaultMemStore(152):
Snapshot called again without clearing previous. Doing nothing. Another ongoing
flush or did we fail last attempt?
2015-10-30 10:38:13,609 WARN [main]
regionserver.MultiVersionConcurrencyControl(214): STUCK:
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:14,633 WARN [main]
regionserver.MultiVersionConcurrencyControl(214): STUCK:
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:15,667 WARN [main]
regionserver.MultiVersionConcurrencyControl(214): STUCK:
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:16,700 WARN [main]
regionserver.MultiVersionConcurrencyControl(214): STUCK:
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:17,730 WARN [main]
regionserver.MultiVersionConcurrencyControl(214): STUCK:
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:18,763 WARN [main]
regionserver.MultiVersionConcurrencyControl(214): STUCK:
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:19,788 WARN [main]
regionserver.MultiVersionConcurrencyControl(214): STUCK:
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:20,817 WARN [main]
regionserver.MultiVersionConcurrencyControl(214): STUCK:
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:21,844 WARN [main]
regionserver.MultiVersionConcurrencyControl(214): STUCK:
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:22,874 WARN [main]
regionserver.MultiVersionConcurrencyControl(214): STUCK:
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:23,905 WARN [main]
regionserver.MultiVersionConcurrencyControl(214): STUCK:
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:24,929 WARN [main]
regionserver.MultiVersionConcurrencyControl(214): STUCK:
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:25,959 WARN [main]
regionserver.MultiVersionConcurrencyControl(214): STUCK:
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:26,987 WARN [main]
regionserver.MultiVersionConcurrencyControl(214): STUCK:
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
2015-10-30 10:38:28,019 WARN [main]
regionserver.MultiVersionConcurrencyControl(214): STUCK:
MultiVersionConcurrencyControl{readPoint=1052, writePoint=1056}
{code}
I don't have a dump but the logs that shows the issue.
> TestHRegion#testFlushCacheWhileScanning goes zombie
> ---------------------------------------------------
>
> Key: HBASE-14495
> URL: https://issues.apache.org/jira/browse/HBASE-14495
> Project: HBase
> Issue Type: Sub-task
> Components: test
> Reporter: stack
> Assignee: stack
> Fix For: 2.0.0
>
> Attachments: 14495.txt, 14495.txt, 14495v3.txt, 14495v6.txt,
> 14495v7.txt, 14495v9.txt
>
>
> This test goes zombie on us, most recently, here:
> https://builds.apache.org/job/PreCommit-HBASE-Build/15744//console
> It does not fail on my internal rig runs nor locally on laptop when run in a
> loop.
> Its hung up in close of the region:
> {code}
> "main" prio=10 tid=0x00007fc49800a800 nid=0x6053 in Object.wait()
> [0x00007fc4a02c9000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x00000007d07c3478> (a java.lang.Object)
> at
> org.apache.hadoop.hbase.regionserver.MultiVersionConcurrencyControl.waitForRead(MultiVersionConcurrencyControl.java:207)
> - locked <0x00000007d07c3478> (a java.lang.Object)
> at
> org.apache.hadoop.hbase.regionserver.MultiVersionConcurrencyControl.completeAndWait(MultiVersionConcurrencyControl.java:143)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2257)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2061)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2026)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2016)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1423)
> at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1344)
> - locked <0x00000007d07c34a8> (a java.lang.Object)
> at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1295)
> at
> org.apache.hadoop.hbase.HBaseTestingUtility.closeRegionAndWAL(HBaseTestingUtility.java:352)
> at
> org.apache.hadoop.hbase.regionserver.TestHRegion.testFlushCacheWhileScanning(TestHRegion.java:3756)
> {code}
> It is waiting on mvcc to catch up.
> There is this comment at the point where we are hung:
> // TODO: Lets see if we hang here, if there is a scenario where
> an outstanding reader
> // with a read point is in advance of this write point.
> mvcc.completeAndWait(writeEntry);
> The above came in with HBASE-12751. The comment was added at v29:
> https://issues.apache.org/jira/secure/attachment/12754775/12751.rebased.v29.txt
> Looks like I added it so must have had predilection that this might be
> dodgy... Let me take a look...
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)