[
https://issues.apache.org/jira/browse/HBASE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042715#comment-13042715
]
gaojinchao commented on HBASE-3892:
-----------------------------------
I am not familiar with unit test(It seems diffult to send a double report of a
split and test cluster function). So
I verified it by modified region server code.
Logs:
2011-06-02 19:57:49,056 INFO org.apache.hadoop.hbase.master.ServerManager:
Received REGION_SPLIT:
ufdr2,8613816508215#1308,1307014802589.c5a8a47d9c84f417b9fcc4c8019e7c7e.:
Daughters;
ufdr2,8613816508215#1308,1307015867020.37481173e31ea469bcaa310cf8d7d980.,
ufdr2,8613816595415#3432,1307015867020.afbf02ef235cabe66026f7c393d79bc0. from
C4C4.site,60020,1307015130114
2011-06-02 19:57:49,057 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
master:60000-0x30502c278f0000 Unable to get data of znode
/hbase/unassigned/c5a8a47d9c84f417b9fcc4c8019e7c7e because node does not exist
(not necessarily an error)
2011-06-02 19:57:49,081 INFO org.apache.hadoop.hbase.master.ServerManager:
Received REGION_SPLIT:
ufdr2,8613816508215#1308,1307014802589.c5a8a47d9c84f417b9fcc4c8019e7c7e.:
Daughters;
ufdr2,8613816508215#1308,1307015867020.37481173e31ea469bcaa310cf8d7d980.,
ufdr2,8613816595415#3432,1307015867020.afbf02ef235cabe66026f7c393d79bc0. from
C4C4.site,60020,1307015130114
2011-06-02 19:57:49,083 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
master:60000-0x30502c278f0000 Unable to get data of znode
/hbase/unassigned/c5a8a47d9c84f417b9fcc4c8019e7c7e because node does not exist
(not necessarily an error)
2011-06-02 19:57:49,083 WARN org.apache.hadoop.hbase.master.AssignmentManager:
Trying to process the split of 37481173e31ea469bcaa310cf8d7d980, but it was
already done and one daughter is on region server
serverName=C4C4.site,60020,1307015130114, load=(requests=0, regions=0,
usedHeap=0, maxHeap=0)
2011-06-02 19:57:56,468 INFO org.apache.hadoop.hbase.master.ServerManager:
Received REGION_SPLIT:
ufdr2,8613819021840#1446,1307014756068.f865d41d918297f30b576b9ea3ccea07.:
Daughters;
ufdr2,8613819021840#1446,1307015873554.baa21e4f0cfa5840f009d0fac8e83d15.,
ufdr2,8613819104397#3916,1307015873554.fb63f608e5e37f5e85d71c925bc78010. from
C4C3.site,60020,1307015129703
2011-06-02 19:57:56,470 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
master:60000-0x30502c278f0000 Unable to get data of znode
/hbase/unassigned/f865d41d918297f30b576b9ea3ccea07 because node does not exist
(not necessarily an error)
2011-06-02 19:57:56,472 INFO org.apache.hadoop.hbase.master.ServerManager:
Received REGION_SPLIT:
ufdr2,8613819021840#1446,1307014756068.f865d41d918297f30b576b9ea3ccea07.:
Daughters;
ufdr2,8613819021840#1446,1307015873554.baa21e4f0cfa5840f009d0fac8e83d15.,
ufdr2,8613819104397#3916,1307015873554.fb63f608e5e37f5e85d71c925bc78010. from
C4C3.site,60020,1307015129703
2011-06-02 19:57:56,474 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
master:60000-0x30502c278f0000 Unable to get data of znode
/hbase/unassigned/f865d41d918297f30b576b9ea3ccea07 because node does not exist
(not necessarily an error)
2011-06-02 19:57:56,474 WARN org.apache.hadoop.hbase.master.AssignmentManager:
Trying to process the split of baa21e4f0cfa5840f009d0fac8e83d15, but it was
already done and one daughter is on region server
serverName=C4C3.site,60020,1307015129703, load=(requests=0, regions=0,
usedHeap=0, maxHeap=0)
Thanks for your hint. It should be a 60 seconds timeout. Region server repeated
message about 60s.
2011-05-08 17:43:45,507 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at
C4C1.site:60000
2011-05-08 17:44:45,521 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at
C4C1.site:60000
It seems a race with regionsInTransition. So IPC was blocked. I try to
reproduce it.
Hmaster logs: It recieved so much message RS_ZK_REGION_CLOSED.
2011-05-08 17:43:45,157 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
master:60000-0x22fcd582836003d Retrieved 125 byte(s) of data from znode
/hbase/unassigned/83c05d9ead23d9a260edf30dc8739cf7 and set watcher;
region=ufdr,2011050802#8613815394007#0610,1304847545412.83c05d9ead23d9a260edf30dc8739cf7.,
server=C4C4.site,60020,1304820199467, state=RS_ZK_REGION_CLOSING
2011-05-08 17:43:45,525 INFO org.apache.hadoop.hbase.master.ServerManager:
Received REGION_SPLIT:
ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
Daughters;
ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
from C4C4.site,60020,1304820199467
2011-05-08 17:43:48,943 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
master:60000-0x22fcd582836003d Retrieved 125 byte(s) of data from znode
/hbase/unassigned/5e3bacf3f43b6bad874e80c2f971e632 and set watcher;
region=ufdr,2011050803#8613819042855#3419,1304844562298.5e3bacf3f43b6bad874e80c2f971e632.,
server=C4C5.site,60020,1304820199469, state=RS_ZK_REGION_CLOSED
2011-05-08 17:43:50,777 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
master:60000-0x22fcd582836003d Retrieved 125 byte(s) of data from znode
/hbase/unassigned/aaddc1ecd4ab768357b3a44df6828e23 and set watcher;
region=ufdr,2011050810#8613815306434#1853,1304823564447.aaddc1ecd4ab768357b3a44df6828e23.,
server=C4C5.site,60020,1304820199469, state=RS_ZK_REGION_CLOSED
2011-05-08 17:43:53,219 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
master:60000-0x22fcd582836003d Retrieved 125 byte(s) of data from znode
/hbase/unassigned/6bf4788ea7349a62967b6781b37cf157 and set watcher;
region=ufdr,2011050802#8613816510664#5948,1304842169086.6bf4788ea7349a62967b6781b37cf157.,
server=C4C4.site,60020,1304820199467, state=RS_ZK_REGION_CLOSED
2011-05-08 17:43:55,812 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
master:60000-0x22fcd582836003d Retrieved 125 byte(s) of data from znode
/hbase/unassigned/3decca49c2dbc3f8a9d49f192b747b12 and set watcher;
region=ufdr,2011050801#8613819767209#4733,1304837619424.3decca49c2dbc3f8a9d49f192b747b12.,
server=C4C3.site,60020,1304820199474, state=RS_ZK_REGION_CLOSED
2011-05-08 17:43:56,722 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
master:60000-0x22fcd582836003d Retrieved 125 byte(s) of data from znode
/hbase/unassigned/49cec92a2e7dd6d8ef38aab6128b29f8 and set watcher;
region=ufdr,2011050802#8613815474662#2305,1304847545412.49cec92a2e7dd6d8ef38aab6128b29f8.,
server=C4C4.site,60020,1304820199467, state=RS_ZK_REGION_CLOSED
2011-05-08 17:44:01,382 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
master:60000-0x22fcd582836003d Retrieved 125 byte(s) of data from znode
/hbase/unassigned/fcaaff18b3afa53e2bae3c9eb7b3aca2 and set watcher;
region=ufdr,2011050803#8613819212381#4006,1304845030265.fcaaff18b3afa53e2bae3c9eb7b3aca2.,
server=C4C5.site,60020,1304820199469, state=RS_ZK_REGION_CLOSED
2011-05-08 17:44:01,383 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
master:60000-0x22fcd582836003d Retrieved 125 byte(s) of data from znode
/hbase/unassigned/6fa30b1a2352e36e0b5c9272b61e17cf and set watcher;
region=ufdr,2011050801#8613818694123#3028,1304847154793.6fa30b1a2352e36e0b5c9272b61e17cf.,
server=C4C4.site,60020,1304820199467, state=RS_ZK_REGION_CLOSED
> Table can't disable
> -------------------
>
> Key: HBASE-3892
> URL: https://issues.apache.org/jira/browse/HBASE-3892
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.90.3
> Reporter: gaojinchao
> Fix For: 0.90.4
>
> Attachments: AssignmentManager_90.patch,
> AssignmentManager_90v2.patch, AssignmentManager_90v3.patch, logs.rar
>
>
> In TimeoutMonitor :
> if node exists and node state is RS_ZK_REGION_CLOSED
> We should send a zk message again when close region is timeout.
> in this case, It may be loss some message.
> I See. It seems like a bug. This is my analysis.
> // disable table and master sent Close message to region server, Region state
> was set PENDING_CLOSE
> 2011-05-08 17:44:25,745 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to
> serverName=C4C4.site,60020,1304820199467, load=(requests=0, regions=123,
> usedHeap=4097, maxHeap=8175) for region
> ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
> 2011-05-08 17:44:45,530 INFO org.apache.hadoop.hbase.master.ServerManager:
> Received REGION_SPLIT:
> ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
> Daughters;
> ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
>
> ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
> from C4C4.site,60020,1304820199467
> 2011-05-08 17:45:45,542 INFO org.apache.hadoop.hbase.master.ServerManager:
> Received REGION_SPLIT:
> ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
> Daughters;
> ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
>
> ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
> from C4C4.site,60020,1304820199467
> // received splitting message and cleared Region state (PENDING_CLOSE)
> 2011-05-08 17:46:45,303 WARN
> org.apache.hadoop.hbase.master.AssignmentManager: Overwriting
> 4418fb197685a21f77e151e401cf8b66 on serverName=C4C4.site,60020,1304820199467,
> load=(requests=0, regions=123, usedHeap=4097, maxHeap=8175)
> 2011-05-08 17:46:45,538 INFO org.apache.hadoop.hbase.master.ServerManager:
> Received REGION_SPLIT:
> ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
> Daughters;
> ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
>
> ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
> from C4C4.site,60020,1304820199467
> 2011-05-08 17:47:45,548 INFO org.apache.hadoop.hbase.master.ServerManager:
> Received REGION_SPLIT:
> ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
> Daughters;
> ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
>
> ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
> from C4C4.site,60020,1304820199467
> 2011-05-08 17:48:45,545 INFO org.apache.hadoop.hbase.master.ServerManager:
> Received REGION_SPLIT:
> ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
> Daughters;
> ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
>
> ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
> from C4C4.site,60020,1304820199467
> 2011-05-08 17:49:46,108 INFO org.apache.hadoop.hbase.master.ServerManager:
> Received REGION_SPLIT:
> ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
> Daughters;
> ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
>
> ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
> from C4C4.site,60020,1304820199467
> 2011-05-08 17:50:46,105 INFO org.apache.hadoop.hbase.master.ServerManager:
> Received REGION_SPLIT:
> ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
> Daughters;
> ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
>
> ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
> from C4C4.site,60020,1304820199467
> 2011-05-08 17:51:46,117 INFO org.apache.hadoop.hbase.master.ServerManager:
> Received REGION_SPLIT:
> ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
> Daughters;
> ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
>
> ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
> from C4C4.site,60020,1304820199467
> 2011-05-08 17:52:46,112 INFO org.apache.hadoop.hbase.master.ServerManager:
> Received REGION_SPLIT:
> ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
> Daughters;
> ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
>
> ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
> from C4C4.site,60020,1304820199467
> 2011-05-08 17:52:47,309 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
> master:60000-0x22fcd582836003d Retrieved 125 byte(s) of data from znode
> /hbase/unassigned/4418fb197685a21f77e151e401cf8b66 and set watcher;
> region=ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.,
> server=C4C4.site,60020,1304820199467, state=RS_ZK_REGION_CLOSED
> 2011-05-08 17:52:47,388 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned
> node: /hbase/unassigned/4418fb197685a21f77e151e401cf8b66
> (region=ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.,
> server=C4C4.site,60020,1304820199467, state=RS_ZK_REGION_CLOSED)
> 2011-05-08 17:52:47,388 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Handling
> transition=RS_ZK_REGION_CLOSED, server=C4C4.site,60020,1304820199467,
> region=4418fb197685a21f77e151e401cf8b66
> // region server had closed region, but the region state had cleared. So it
> printed warning log.
> 2011-05-08 17:52:47,388 WARN
> org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region
> 4418fb197685a21f77e151e401cf8b66 from server C4C4.site,60020,1304820199467
> but region was in the state null and not in expected PENDING_CLOSE or
> CLOSING states
> 2011-05-08 17:52:47,397 WARN
> org.apache.hadoop.hbase.master.AssignmentManager: Overwriting
> 4418fb197685a21f77e151e401cf8b66 on serverName=C4C4.site,60020,1304820199467,
> load=(requests=0, regions=123, usedHeap=4097, maxHeap=8175)
> // The region state was set PENDING_CLOSE again. the table couldn't disable
> and enable.
> 2011-05-08 17:52:47,398 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of
> region
> ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
> (offlining)
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira