[
https://issues.apache.org/jira/browse/HBASE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704942#comment-13704942
]
stack commented on HBASE-8920:
------------------------------
So, here is the testClose finishing up:
{code}
2013-07-10 11:58:46,979 INFO [Thread-387]
master.TestZKBasedOpenCloseRegion(317): Done with testCloseRegion
2013-07-10 11:58:46,981 INFO [StoreOpener-953c5884dc661334f302052cfe8d547b-1]
compactions.CompactionConfiguration(85): size [134217728, 9223372036854775807);
files [3, 10); ratio 1.200000; off-peak ratio 5.000000; throttle point
2684354560; delete expired; major period 604800000, major jitter 0.500000
2013-07-10 11:58:46,993 DEBUG [StoreOpener-953c5884dc661334f302052cfe8d547b-1]
regionserver.HStore(443): loaded
hdfs://localhost:56405/user/jenkins/hbase/TestZKBasedOpenCloseRegion/953c5884dc661334f302052cfe8d547b/a/fb400fbb145949dcb3563bd7b5a3c9ab,
isReference=false, isBulkLoadResult=false, seqid=2, majorCompaction=false
2013-07-10 11:58:46,994 INFO [StoreOpener-953c5884dc661334f302052cfe8d547b-1]
compactions.CompactionConfiguration(85): size [134217728, 9223372036854775807);
files [3, 10); ratio 1.200000; off-peak ratio 5.000000; throttle point
2684354560; delete expired; major period 604800000, major jitter 0.500000
2013-07-10 11:58:46,996 INFO [StoreOpener-953c5884dc661334f302052cfe8d547b-1]
compactions.CompactionConfiguration(85): size [134217728, 9223372036854775807);
files [3, 10); ratio 1.200000; off-peak ratio 5.000000; throttle point
2684354560; delete expired; major period 604800000, major jitter 0.500000
2013-07-10 11:58:47,000 INFO [RS_OPEN_REGION-juno:39691-0]
regionserver.HRegion(619): Onlined 953c5884dc661334f302052cfe8d547b; next
sequenceid=3
2013-07-10 11:58:47,000 DEBUG [RS_OPEN_REGION-juno:39691-0]
zookeeper.ZKAssign(598): regionserver:39691-0x13fc8727d2e0002 Attempting to
retransition the opening state of node 953c5884dc661334f302052cfe8d547b
2013-07-10 11:58:47,001 INFO [pool-1-thread-1] hbase.ResourceChecker(171):
after: master.TestZKBasedOpenCloseRegion#testCloseRegion Thread=281 (was 278)
Potentially hanging thread: MASTER_CLOSE_REGION-juno:45449-0
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:957)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:917)
java.lang.Thread.run(Thread.java:662)
Potentially hanging thread: RS_CLOSE_REGION-juno:39691-0
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:957)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:917)
java.lang.Thread.run(Thread.java:662)
Potentially hanging thread: StoreOpener-953c5884dc661334f302052cfe8d547b-1
java.util.regex.Pattern$GroupTail.match(Pattern.java:4231)
java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
java.util.regex.Pattern$Curly.match(Pattern.java:3744)
java.util.regex.Pattern$CharProperty.match(Pattern.java:3345)
java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
java.util.regex.Pattern$Begin.match(Pattern.java:3120)
java.util.regex.Matcher.match(Matcher.java:1127)
java.util.regex.Matcher.matches(Matcher.java:502)
org.apache.hadoop.hbase.io.HFileLink.isHFileLink(HFileLink.java:140)
org.apache.hadoop.hbase.regionserver.StoreFileInfo.validateStoreFileName(StoreFileInfo.java:297)
org.apache.hadoop.hbase.regionserver.StoreFileInfo.isValid(StoreFileInfo.java:322)
org.apache.hadoop.hbase.regionserver.HRegionFileSystem.getStoreFiles(HRegionFileSystem.java:190)
org.apache.hadoop.hbase.regionserver.HStore.loadStoreFiles(HStore.java:407)
org.apache.hadoop.hbase.regionserver.HStore.<init>(HStore.java:242)
org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:3104)
org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:655)
org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:653)
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
java.util.concurrent.FutureTask.run(FutureTask.java:138)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
java.util.concurrent.FutureTask.run(FutureTask.java:138)
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
java.lang.Thread.run(Thread.java:662)
- Thread LEAK? -, OpenFileDescriptor=445 (was 442) - OpenFileDescriptor LEAK?
-, MaxFileDescriptor=40000 (was 40000), SystemLoadAverage=324 (was 324),
ProcessCount=141 (was 141), AvailableMemoryMB=1280 (was 1283),
ConnectionCount=6 (was 6)
2013-07-10 11:58:47,002 INFO
[PostOpenDeployTasks:953c5884dc661334f302052cfe8d547b]
regionserver.HRegionServer(1698): Post open deploy tasks for
region=TestZKBasedOpenCloseRegion,bbb,1373457519402.953c5884dc661334f302052cfe8d547b.
2013-07-10 11:58:47,009 INFO
[PostOpenDeployTasks:953c5884dc661334f302052cfe8d547b] catalog.MetaEditor(432):
Updated row
TestZKBasedOpenCloseRegion,bbb,1373457519402.953c5884dc661334f302052cfe8d547b.
with server=juno.apache.org,39691,1373457514592
2013-07-10 11:58:47,009 INFO
[PostOpenDeployTasks:953c5884dc661334f302052cfe8d547b]
regionserver.HRegionServer(1723): Done with post open deploy task for
region=TestZKBasedOpenCloseRegion,bbb,1373457519402.953c5884dc661334f302052cfe8d547b.
2013-07-10 11:58:47,009 DEBUG [RS_OPEN_REGION-juno:39691-0]
zookeeper.ZKAssign(786): regionserver:39691-0x13fc8727d2e0002 Attempting to
transition node 953c5884dc661334f302052cfe8d547b from RS_ZK_REGION_OPENING to
RS_ZK_REGION_OPENED
2013-07-10 11:58:47,020 DEBUG [pool-1-thread-1-EventThread]
zookeeper.ZooKeeperWatcher(307): master:45449-0x13fc8727d2e0000 Received
ZooKeeper Event, type=NodeDataChanged, state=SyncConnected,
path=/hbase/region-in-transition/953c5884dc661334f302052cfe8d547b
2013-07-10 11:58:47,020 DEBUG [RS_OPEN_REGION-juno:39691-0]
zookeeper.ZKAssign(862): regionserver:39691-0x13fc8727d2e0002 Successfully
transitioned node 953c5884dc661334f302052cfe8d547b from RS_ZK_REGION_OPENING to
RS_ZK_REGION_OPENED
2013-07-10 11:58:47,020 DEBUG [RS_OPEN_REGION-juno:39691-0]
handler.OpenRegionHandler(373): region transitioned to opened in zookeeper:
{ENCODED => 953c5884dc661334f302052cfe8d547b, NAME =>
'TestZKBasedOpenCloseRegion,bbb,1373457519402.953c5884dc661334f302052cfe8d547b.',
STARTKEY => 'bbb', ENDKEY => 'ccc'}, server:
juno.apache.org,39691,1373457514592
2013-07-10 11:58:47,020 DEBUG [RS_OPEN_REGION-juno:39691-0]
handler.OpenRegionHandler(186): Opened
TestZKBasedOpenCloseRegion,bbb,1373457519402.953c5884dc661334f302052cfe8d547b.
on server:juno.apache.org,39691,1373457514592
2013-07-10 11:58:47,021 DEBUG [hbase-am-zkevent-worker-pool-2-thread-15]
master.AssignmentManager(768): Handling transition=RS_ZK_REGION_OPENED,
server=juno.apache.org,39691,1373457514592,
region=953c5884dc661334f302052cfe8d547b, current state from region state map
={953c5884dc661334f302052cfe8d547b state=OPENING, ts=1373457526962,
server=juno.apache.org,39691,1373457514592}
....
{code}
The region reopen is going ahead after the close succeeds.
Now, we fall into the next test, the testReOpen, the one that is failing:
{code}
...
2013-07-10 11:58:47,021 INFO [pool-1-thread-1] hbase.ResourceChecker(147):
before: master.TestZKBasedOpenCloseRegion#testReOpenRegion Thread=281,
OpenFileDescriptor=448, MaxFileDescriptor=40000, SystemLoadAverage=324,
ProcessCount=141, AvailableMemoryMB=1280, ConnectionCount=6
2013-07-10 11:58:47,021 INFO [hbase-am-zkevent-worker-pool-2-thread-15]
master.RegionStates(265): Region transitioned from
{953c5884dc661334f302052cfe8d547b state=OPENING, ts=1373457526962,
server=juno.apache.org,39691,1373457514592} to
{953c5884dc661334f302052cfe8d547b state=OPEN, ts=1373457527021,
server=juno.apache.org,39691,1373457514592}
2013-07-10 11:58:47,022 DEBUG [MASTER_OPEN_REGION-juno:45449-2]
handler.OpenedRegionHandler(145): Handling OPENED event for
953c5884dc661334f302052cfe8d547b from juno.apache.org,39691,1373457514592;
deleting unassigned node
2013-07-10 11:58:47,022 DEBUG [MASTER_OPEN_REGION-juno:45449-2]
zookeeper.ZKAssign(405): master:45449-0x13fc8727d2e0000 Deleting existing
unassigned node for 953c5884dc661334f302052cfe8d547b that is in expected state
RS_ZK_REGION_OPENED
2013-07-10 11:58:47,032 DEBUG [pool-1-thread-1] client.ClientScanner(195):
Finished region={ENCODED => 1028785192, NAME => '.META.,,1', STARTKEY => '',
ENDKEY => ''}
2013-07-10 11:58:47,032 INFO [Thread-394]
master.TestZKBasedOpenCloseRegion(109): Number of region servers = 2
2013-07-10 11:58:47,033 INFO [Thread-394]
master.TestZKBasedOpenCloseRegion(150):
TestZKBasedOpenCloseRegion,bbb,1373457519402.953c5884dc661334f302052cfe8d547b.
2013-07-10 11:58:47,033 DEBUG [Thread-394]
master.TestZKBasedOpenCloseRegion(116): Asking RS to close region
TestZKBasedOpenCloseRegion,bbb,1373457519402.953c5884dc661334f302052cfe8d547b.
2013-07-10 11:58:47,034 INFO [Thread-394]
master.TestZKBasedOpenCloseRegion(133): Unassign
TestZKBasedOpenCloseRegion,bbb,1373457519402.953c5884dc661334f302052cfe8d547b.
2013-07-10 11:58:47,034 DEBUG [Thread-394] master.AssignmentManager(2206):
Starting unassignment of region
TestZKBasedOpenCloseRegion,bbb,1373457519402.953c5884dc661334f302052cfe8d547b.
(offlining)
2013-07-10 11:58:47,035 DEBUG [Thread-394] master.AssignmentManager(2279):
Attempting to unassign region
TestZKBasedOpenCloseRegion,bbb,1373457519402.953c5884dc661334f302052cfe8d547b.
but it is already in transition (OPEN, force=false)
2013-07-10 11:58:47,036 DEBUG [pool-1-thread-1-EventThread]
zookeeper.ZooKeeperWatcher(307): master:45449-0x13fc8727d2e0000 Received
ZooKeeper Event, type=NodeDeleted, state=SyncConnected,
path=/hbase/region-in-transition/953c5884dc661334f302052cfe8d547b
2013-07-10 11:58:47,036 DEBUG [hbase-am-zkevent-worker-pool-2-thread-13]
master.AssignmentManager$4(1219): The znode of region
TestZKBasedOpenCloseRegion,bbb,1373457519402.953c5884dc661334f302052cfe8d547b.
has been deleted, region state: {953c5884dc661334f302052cfe8d547b state=OPEN,
ts=1373457527021, server=juno.apache.org,39691,1373457514592}
2013-07-10 11:58:47,036 DEBUG [pool-1-thread-1-EventThread]
zookeeper.ZooKeeperWatcher(307): master:45449-0x13fc8727d2e0000 Received
ZooKeeper Event, type=NodeChildrenChanged, state=SyncConnected,
path=/hbase/region-in-transition
2013-07-10 11:58:47,036 INFO [hbase-am-zkevent-worker-pool-2-thread-13]
master.AssignmentManager$4(1224): The master has opened the region
TestZKBasedOpenCloseRegion,bbb,1373457519402.953c5884dc661334f302052cfe8d547b.
that was online on juno.apache.org,39691,1373457514592
2013-07-10 11:58:47,036 DEBUG [MASTER_OPEN_REGION-juno:45449-2]
zookeeper.ZKAssign(434): master:45449-0x13fc8727d2e0000 Successfully deleted
unassigned node for region 953c5884dc661334f302052cfe8d547b in expected state
RS_ZK_REGION_OPENED
...
{code}
We are dropping the close event. It is stomped on by the successful open.
I could ensure nothing is in transition before starting the test but this seems
like a conflict we could see out on running cluster.
> TestZKBasedOpenCloseRegion.testReOpenRegion fails occasionally
> --------------------------------------------------------------
>
> Key: HBASE-8920
> URL: https://issues.apache.org/jira/browse/HBASE-8920
> Project: HBase
> Issue Type: Bug
> Reporter: stack
>
> Failed here:
> https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/607/testReport/org.apache.hadoop.hbase.master/TestZKBasedOpenCloseRegion/testReOpenRegion/
> and
> https://builds.apache.org/job/HBase-TRUNK/4232/testReport/org.apache.hadoop.hbase.master/TestZKBasedOpenCloseRegion/testReOpenRegion/
> In both cases, we fail for the same reason. The previous test closes a
> region. The failing test starts but we are reopening the region we'd just
> closed as we are wont to do. This is the region that gets chosen by the
> failing test to practise reopen on only it is not yet fully open so in comes
> the close and it gets stomped on by the ongoing open.
> Let me post some log.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira