[ 
https://issues.apache.org/jira/browse/HBASE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704942#comment-13704942
 ] 

stack commented on HBASE-8920:
------------------------------

So, here is the testClose finishing up:

{code}
2013-07-10 11:58:46,979 INFO  [Thread-387] 
master.TestZKBasedOpenCloseRegion(317): Done with testCloseRegion
2013-07-10 11:58:46,981 INFO  [StoreOpener-953c5884dc661334f302052cfe8d547b-1] 
compactions.CompactionConfiguration(85): size [134217728, 9223372036854775807); 
files [3, 10); ratio 1.200000; off-peak ratio 5.000000; throttle point 
2684354560; delete expired; major period 604800000, major jitter 0.500000
2013-07-10 11:58:46,993 DEBUG [StoreOpener-953c5884dc661334f302052cfe8d547b-1] 
regionserver.HStore(443): loaded 
hdfs://localhost:56405/user/jenkins/hbase/TestZKBasedOpenCloseRegion/953c5884dc661334f302052cfe8d547b/a/fb400fbb145949dcb3563bd7b5a3c9ab,
 isReference=false, isBulkLoadResult=false, seqid=2, majorCompaction=false
2013-07-10 11:58:46,994 INFO  [StoreOpener-953c5884dc661334f302052cfe8d547b-1] 
compactions.CompactionConfiguration(85): size [134217728, 9223372036854775807); 
files [3, 10); ratio 1.200000; off-peak ratio 5.000000; throttle point 
2684354560; delete expired; major period 604800000, major jitter 0.500000
2013-07-10 11:58:46,996 INFO  [StoreOpener-953c5884dc661334f302052cfe8d547b-1] 
compactions.CompactionConfiguration(85): size [134217728, 9223372036854775807); 
files [3, 10); ratio 1.200000; off-peak ratio 5.000000; throttle point 
2684354560; delete expired; major period 604800000, major jitter 0.500000
2013-07-10 11:58:47,000 INFO  [RS_OPEN_REGION-juno:39691-0] 
regionserver.HRegion(619): Onlined 953c5884dc661334f302052cfe8d547b; next 
sequenceid=3
2013-07-10 11:58:47,000 DEBUG [RS_OPEN_REGION-juno:39691-0] 
zookeeper.ZKAssign(598): regionserver:39691-0x13fc8727d2e0002 Attempting to 
retransition the opening state of node 953c5884dc661334f302052cfe8d547b
2013-07-10 11:58:47,001 INFO  [pool-1-thread-1] hbase.ResourceChecker(171): 
after: master.TestZKBasedOpenCloseRegion#testCloseRegion Thread=281 (was 278)
Potentially hanging thread: MASTER_CLOSE_REGION-juno:45449-0
        sun.misc.Unsafe.park(Native Method)
        java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
        
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
        
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
        
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:957)
        
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:917)
        java.lang.Thread.run(Thread.java:662)

Potentially hanging thread: RS_CLOSE_REGION-juno:39691-0
        sun.misc.Unsafe.park(Native Method)
        java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
        
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
        
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
        
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:957)
        
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:917)
        java.lang.Thread.run(Thread.java:662)

Potentially hanging thread: StoreOpener-953c5884dc661334f302052cfe8d547b-1
        java.util.regex.Pattern$GroupTail.match(Pattern.java:4231)
        java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
        java.util.regex.Pattern$Curly.match(Pattern.java:3744)
        java.util.regex.Pattern$CharProperty.match(Pattern.java:3345)
        java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
        java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
        java.util.regex.Pattern$Begin.match(Pattern.java:3120)
        java.util.regex.Matcher.match(Matcher.java:1127)
        java.util.regex.Matcher.matches(Matcher.java:502)
        org.apache.hadoop.hbase.io.HFileLink.isHFileLink(HFileLink.java:140)
        
org.apache.hadoop.hbase.regionserver.StoreFileInfo.validateStoreFileName(StoreFileInfo.java:297)
        
org.apache.hadoop.hbase.regionserver.StoreFileInfo.isValid(StoreFileInfo.java:322)
        
org.apache.hadoop.hbase.regionserver.HRegionFileSystem.getStoreFiles(HRegionFileSystem.java:190)
        
org.apache.hadoop.hbase.regionserver.HStore.loadStoreFiles(HStore.java:407)
        org.apache.hadoop.hbase.regionserver.HStore.<init>(HStore.java:242)
        
org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:3104)
        org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:655)
        org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:653)
        java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        java.util.concurrent.FutureTask.run(FutureTask.java:138)
        java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
        java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        java.util.concurrent.FutureTask.run(FutureTask.java:138)
        
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
        
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
        java.lang.Thread.run(Thread.java:662)
 - Thread LEAK? -, OpenFileDescriptor=445 (was 442) - OpenFileDescriptor LEAK? 
-, MaxFileDescriptor=40000 (was 40000), SystemLoadAverage=324 (was 324), 
ProcessCount=141 (was 141), AvailableMemoryMB=1280 (was 1283), 
ConnectionCount=6 (was 6)
2013-07-10 11:58:47,002 INFO  
[PostOpenDeployTasks:953c5884dc661334f302052cfe8d547b] 
regionserver.HRegionServer(1698): Post open deploy tasks for 
region=TestZKBasedOpenCloseRegion,bbb,1373457519402.953c5884dc661334f302052cfe8d547b.
2013-07-10 11:58:47,009 INFO  
[PostOpenDeployTasks:953c5884dc661334f302052cfe8d547b] catalog.MetaEditor(432): 
Updated row 
TestZKBasedOpenCloseRegion,bbb,1373457519402.953c5884dc661334f302052cfe8d547b. 
with server=juno.apache.org,39691,1373457514592
2013-07-10 11:58:47,009 INFO  
[PostOpenDeployTasks:953c5884dc661334f302052cfe8d547b] 
regionserver.HRegionServer(1723): Done with post open deploy task for 
region=TestZKBasedOpenCloseRegion,bbb,1373457519402.953c5884dc661334f302052cfe8d547b.
2013-07-10 11:58:47,009 DEBUG [RS_OPEN_REGION-juno:39691-0] 
zookeeper.ZKAssign(786): regionserver:39691-0x13fc8727d2e0002 Attempting to 
transition node 953c5884dc661334f302052cfe8d547b from RS_ZK_REGION_OPENING to 
RS_ZK_REGION_OPENED
2013-07-10 11:58:47,020 DEBUG [pool-1-thread-1-EventThread] 
zookeeper.ZooKeeperWatcher(307): master:45449-0x13fc8727d2e0000 Received 
ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, 
path=/hbase/region-in-transition/953c5884dc661334f302052cfe8d547b
2013-07-10 11:58:47,020 DEBUG [RS_OPEN_REGION-juno:39691-0] 
zookeeper.ZKAssign(862): regionserver:39691-0x13fc8727d2e0002 Successfully 
transitioned node 953c5884dc661334f302052cfe8d547b from RS_ZK_REGION_OPENING to 
RS_ZK_REGION_OPENED
2013-07-10 11:58:47,020 DEBUG [RS_OPEN_REGION-juno:39691-0] 
handler.OpenRegionHandler(373): region transitioned to opened in zookeeper: 
{ENCODED => 953c5884dc661334f302052cfe8d547b, NAME => 
'TestZKBasedOpenCloseRegion,bbb,1373457519402.953c5884dc661334f302052cfe8d547b.',
 STARTKEY => 'bbb', ENDKEY => 'ccc'}, server: 
juno.apache.org,39691,1373457514592
2013-07-10 11:58:47,020 DEBUG [RS_OPEN_REGION-juno:39691-0] 
handler.OpenRegionHandler(186): Opened 
TestZKBasedOpenCloseRegion,bbb,1373457519402.953c5884dc661334f302052cfe8d547b. 
on server:juno.apache.org,39691,1373457514592
2013-07-10 11:58:47,021 DEBUG [hbase-am-zkevent-worker-pool-2-thread-15] 
master.AssignmentManager(768): Handling transition=RS_ZK_REGION_OPENED, 
server=juno.apache.org,39691,1373457514592, 
region=953c5884dc661334f302052cfe8d547b, current state from region state map 
={953c5884dc661334f302052cfe8d547b state=OPENING, ts=1373457526962, 
server=juno.apache.org,39691,1373457514592}

....
{code}

The region reopen is going ahead after the close succeeds.

Now, we fall into the next test, the testReOpen, the one that is failing:

{code}
...
2013-07-10 11:58:47,021 INFO  [pool-1-thread-1] hbase.ResourceChecker(147): 
before: master.TestZKBasedOpenCloseRegion#testReOpenRegion Thread=281, 
OpenFileDescriptor=448, MaxFileDescriptor=40000, SystemLoadAverage=324, 
ProcessCount=141, AvailableMemoryMB=1280, ConnectionCount=6
2013-07-10 11:58:47,021 INFO  [hbase-am-zkevent-worker-pool-2-thread-15] 
master.RegionStates(265): Region transitioned from 
{953c5884dc661334f302052cfe8d547b state=OPENING, ts=1373457526962, 
server=juno.apache.org,39691,1373457514592} to 
{953c5884dc661334f302052cfe8d547b state=OPEN, ts=1373457527021, 
server=juno.apache.org,39691,1373457514592}
2013-07-10 11:58:47,022 DEBUG [MASTER_OPEN_REGION-juno:45449-2] 
handler.OpenedRegionHandler(145): Handling OPENED event for 
953c5884dc661334f302052cfe8d547b from juno.apache.org,39691,1373457514592; 
deleting unassigned node
2013-07-10 11:58:47,022 DEBUG [MASTER_OPEN_REGION-juno:45449-2] 
zookeeper.ZKAssign(405): master:45449-0x13fc8727d2e0000 Deleting existing 
unassigned node for 953c5884dc661334f302052cfe8d547b that is in expected state 
RS_ZK_REGION_OPENED
2013-07-10 11:58:47,032 DEBUG [pool-1-thread-1] client.ClientScanner(195): 
Finished region={ENCODED => 1028785192, NAME => '.META.,,1', STARTKEY => '', 
ENDKEY => ''}
2013-07-10 11:58:47,032 INFO  [Thread-394] 
master.TestZKBasedOpenCloseRegion(109): Number of region servers = 2
2013-07-10 11:58:47,033 INFO  [Thread-394] 
master.TestZKBasedOpenCloseRegion(150): 
TestZKBasedOpenCloseRegion,bbb,1373457519402.953c5884dc661334f302052cfe8d547b.
2013-07-10 11:58:47,033 DEBUG [Thread-394] 
master.TestZKBasedOpenCloseRegion(116): Asking RS to close region 
TestZKBasedOpenCloseRegion,bbb,1373457519402.953c5884dc661334f302052cfe8d547b.
2013-07-10 11:58:47,034 INFO  [Thread-394] 
master.TestZKBasedOpenCloseRegion(133): Unassign 
TestZKBasedOpenCloseRegion,bbb,1373457519402.953c5884dc661334f302052cfe8d547b.
2013-07-10 11:58:47,034 DEBUG [Thread-394] master.AssignmentManager(2206): 
Starting unassignment of region 
TestZKBasedOpenCloseRegion,bbb,1373457519402.953c5884dc661334f302052cfe8d547b. 
(offlining)
2013-07-10 11:58:47,035 DEBUG [Thread-394] master.AssignmentManager(2279): 
Attempting to unassign region 
TestZKBasedOpenCloseRegion,bbb,1373457519402.953c5884dc661334f302052cfe8d547b. 
but it is already in transition (OPEN, force=false)
2013-07-10 11:58:47,036 DEBUG [pool-1-thread-1-EventThread] 
zookeeper.ZooKeeperWatcher(307): master:45449-0x13fc8727d2e0000 Received 
ZooKeeper Event, type=NodeDeleted, state=SyncConnected, 
path=/hbase/region-in-transition/953c5884dc661334f302052cfe8d547b
2013-07-10 11:58:47,036 DEBUG [hbase-am-zkevent-worker-pool-2-thread-13] 
master.AssignmentManager$4(1219): The znode of region 
TestZKBasedOpenCloseRegion,bbb,1373457519402.953c5884dc661334f302052cfe8d547b. 
has been deleted, region state: {953c5884dc661334f302052cfe8d547b state=OPEN, 
ts=1373457527021, server=juno.apache.org,39691,1373457514592}
2013-07-10 11:58:47,036 DEBUG [pool-1-thread-1-EventThread] 
zookeeper.ZooKeeperWatcher(307): master:45449-0x13fc8727d2e0000 Received 
ZooKeeper Event, type=NodeChildrenChanged, state=SyncConnected, 
path=/hbase/region-in-transition
2013-07-10 11:58:47,036 INFO  [hbase-am-zkevent-worker-pool-2-thread-13] 
master.AssignmentManager$4(1224): The master has opened the region 
TestZKBasedOpenCloseRegion,bbb,1373457519402.953c5884dc661334f302052cfe8d547b. 
that was online on juno.apache.org,39691,1373457514592
2013-07-10 11:58:47,036 DEBUG [MASTER_OPEN_REGION-juno:45449-2] 
zookeeper.ZKAssign(434): master:45449-0x13fc8727d2e0000 Successfully deleted 
unassigned node for region 953c5884dc661334f302052cfe8d547b in expected state 
RS_ZK_REGION_OPENED
...
{code}

We are dropping the close event.  It is stomped on by the successful open.

I could ensure nothing is in transition before starting the test but this seems 
like a conflict we could see out on running cluster.
                
> TestZKBasedOpenCloseRegion.testReOpenRegion fails occasionally
> --------------------------------------------------------------
>
>                 Key: HBASE-8920
>                 URL: https://issues.apache.org/jira/browse/HBASE-8920
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>
> Failed here:
> https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/607/testReport/org.apache.hadoop.hbase.master/TestZKBasedOpenCloseRegion/testReOpenRegion/
> and
> https://builds.apache.org/job/HBase-TRUNK/4232/testReport/org.apache.hadoop.hbase.master/TestZKBasedOpenCloseRegion/testReOpenRegion/
> In both cases, we fail for the same reason.  The previous test closes a 
> region.  The failing test starts but we are reopening the region we'd just 
> closed as we are wont to do.  This is the region that gets chosen by the 
> failing test to practise reopen on only  it is not yet fully open so in comes 
> the close and it gets stomped on by the ongoing open.
> Let me post some log.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to