[ 
https://issues.apache.org/jira/browse/HBASE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042050#comment-13042050
 ] 

gaojinchao commented on HBASE-3892:
-----------------------------------

I keep digging, find the repeated message was sent by region server.

If regionServerReport throwed exception, Region server will connect Hmaster 
again and send message again.

//region server logs.
2011-05-08 17:43:45,507 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at 
C4C1.site:60000
2011-05-08 17:44:45,521 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at 
C4C1.site:60000
2011-05-08 17:45:45,524 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at 
C4C1.site:60000
2011-05-08 17:46:45,528 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at 
C4C1.site:60000
2011-05-08 17:47:45,531 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at 
C4C1.site:60000
2011-05-08 17:48:45,535 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at 
C4C1.site:60000
2011-05-08 17:49:46,091 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at 
C4C1.site:60000
2011-05-08 17:50:46,096 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at 
C4C1.site:60000
2011-05-08 17:51:46,099 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at 
C4C1.site:60000
2011-05-08 17:52:46,104 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at 
C4C1.site:60000

//region server code.
List<HMsg> tryRegionServerReport(final List<HMsg> outboundMessages)
  throws IOException {
    this.serverInfo.setLoad(buildServerLoad());
    this.requestCount.set(0);
    addOutboundMsgs(outboundMessages);
    HMsg [] msgs = null;
    while (!this.stopped) {
      try {
        msgs = this.hbaseMaster.regionServerReport(this.serverInfo,   
          outboundMessages.toArray(HMsg.EMPTY_HMSG_ARRAY),
          getMostLoadedRegions());
        break;
      } catch (IOException ioe) {
        if (ioe instanceof RemoteException) {
          ioe = ((RemoteException)ioe).unwrapRemoteException();
        }
        if (ioe instanceof YouAreDeadException) {
          // This will be caught and handled as a fatal error in run()
          throw ioe;
        }
        // Couldn't connect to the master, get location from zk and reconnect
        // Method blocks until new master is found or we are stopped
        getMaster();
      }
    }

Why did regionServerReport throw exception ?

It seems Hmaster was busy and IPC blocked. 

Hmaster logs:
2011-05-08 17:44:25,745 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
Responder, call regionServerReport(serverName=C4C4.site,60020,1304820199467, 
load=(requests=0, regions=123, usedHeap=4097, maxHeap=8175), 
[Lorg.apache.hadoop.hbase.HMsg;@520ed128, 
[Lorg.apache.hadoop.hbase.HRegionInfo;@4ac5c32e) from 157.5.100.4:50187: output 
error
2011-05-08 17:44:25,745 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
handler 11 on 60000 caught: java.nio.channels.ClosedChannelException
        at 
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1341)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:727)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:792)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1083)



> Table can't disable
> -------------------
>
>                 Key: HBASE-3892
>                 URL: https://issues.apache.org/jira/browse/HBASE-3892
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: gaojinchao
>             Fix For: 0.90.4
>
>         Attachments: AssignmentManager_90.patch, 
> AssignmentManager_90v2.patch, logs.rar
>
>
> In TimeoutMonitor : 
> if node exists and node state is RS_ZK_REGION_CLOSED
> We should send a zk message again when close region is timeout.
> in this case, It may be loss some message.
> I See. It seems like a bug. This is my analysis.
> // disable table and master sent Close message to region server, Region state 
> was set PENDING_CLOSE
> 2011-05-08 17:44:25,745 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
> serverName=C4C4.site,60020,1304820199467, load=(requests=0, regions=123, 
> usedHeap=4097, maxHeap=8175) for region 
> ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
> 2011-05-08 17:44:45,530 INFO org.apache.hadoop.hbase.master.ServerManager: 
> Received REGION_SPLIT: 
> ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
>  Daughters; 
> ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
>  
> ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
>  from C4C4.site,60020,1304820199467
> 2011-05-08 17:45:45,542 INFO org.apache.hadoop.hbase.master.ServerManager: 
> Received REGION_SPLIT: 
> ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
>  Daughters; 
> ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
>  
> ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
>  from C4C4.site,60020,1304820199467
> // received splitting message and cleared Region state (PENDING_CLOSE)
> 2011-05-08 17:46:45,303 WARN 
> org.apache.hadoop.hbase.master.AssignmentManager: Overwriting 
> 4418fb197685a21f77e151e401cf8b66 on serverName=C4C4.site,60020,1304820199467, 
> load=(requests=0, regions=123, usedHeap=4097, maxHeap=8175)
> 2011-05-08 17:46:45,538 INFO org.apache.hadoop.hbase.master.ServerManager: 
> Received REGION_SPLIT: 
> ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
>  Daughters; 
> ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
>  
> ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
>  from C4C4.site,60020,1304820199467
> 2011-05-08 17:47:45,548 INFO org.apache.hadoop.hbase.master.ServerManager: 
> Received REGION_SPLIT: 
> ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
>  Daughters; 
> ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
>  
> ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
>  from C4C4.site,60020,1304820199467
> 2011-05-08 17:48:45,545 INFO org.apache.hadoop.hbase.master.ServerManager: 
> Received REGION_SPLIT: 
> ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
>  Daughters; 
> ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
>  
> ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
>  from C4C4.site,60020,1304820199467
> 2011-05-08 17:49:46,108 INFO org.apache.hadoop.hbase.master.ServerManager: 
> Received REGION_SPLIT: 
> ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
>  Daughters; 
> ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
>  
> ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
>  from C4C4.site,60020,1304820199467
> 2011-05-08 17:50:46,105 INFO org.apache.hadoop.hbase.master.ServerManager: 
> Received REGION_SPLIT: 
> ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
>  Daughters; 
> ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
>  
> ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
>  from C4C4.site,60020,1304820199467
> 2011-05-08 17:51:46,117 INFO org.apache.hadoop.hbase.master.ServerManager: 
> Received REGION_SPLIT: 
> ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
>  Daughters; 
> ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
>  
> ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
>  from C4C4.site,60020,1304820199467
> 2011-05-08 17:52:46,112 INFO org.apache.hadoop.hbase.master.ServerManager: 
> Received REGION_SPLIT: 
> ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
>  Daughters; 
> ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
>  
> ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
>  from C4C4.site,60020,1304820199467
> 2011-05-08 17:52:47,309 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: 
> master:60000-0x22fcd582836003d Retrieved 125 byte(s) of data from znode 
> /hbase/unassigned/4418fb197685a21f77e151e401cf8b66 and set watcher; 
> region=ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.,
>  server=C4C4.site,60020,1304820199467, state=RS_ZK_REGION_CLOSED
> 2011-05-08 17:52:47,388 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned 
> node: /hbase/unassigned/4418fb197685a21f77e151e401cf8b66 
> (region=ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.,
>  server=C4C4.site,60020,1304820199467, state=RS_ZK_REGION_CLOSED)
> 2011-05-08 17:52:47,388 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_CLOSED, server=C4C4.site,60020,1304820199467, 
> region=4418fb197685a21f77e151e401cf8b66
> // region server had closed region, but the region state had cleared. So it 
> printed warning log.
> 2011-05-08 17:52:47,388 WARN 
> org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
> 4418fb197685a21f77e151e401cf8b66 from server C4C4.site,60020,1304820199467 
> but region was in  the state null and not in expected PENDING_CLOSE or 
> CLOSING states
> 2011-05-08 17:52:47,397 WARN 
> org.apache.hadoop.hbase.master.AssignmentManager: Overwriting 
> 4418fb197685a21f77e151e401cf8b66 on serverName=C4C4.site,60020,1304820199467, 
> load=(requests=0, regions=123, usedHeap=4097, maxHeap=8175)
> // The region state was set PENDING_CLOSE again.  the table couldn't disable 
> and enable.
> 2011-05-08 17:52:47,398 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
> region 
> ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
>  (offlining)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to