[ 
https://issues.apache.org/jira/browse/HBASE-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491099#comment-13491099
 ] 

Enis Soztutar commented on HBASE-7009:
--------------------------------------

@Jimmy, are you referring to MiniHBaseCluster.getClientProtocol(). Is there any 
reason, why adding this would break BC? 

Tested the patch: 
{code}
[root@ip-10-191-190-58 hbase]# bin/hbase 
org.apache.hadoop.hbase.util.ChaosMonkey 
12/11/05 19:13:37 INFO util.ChaosMonkey: Sleeping for 17573 to add jitter
12/11/05 19:13:55 INFO util.ChaosMonkey: Performing action: Restart random 
region server
12/11/05 19:13:55 INFO util.ChaosMonkey: Killing region 
server:ip-10-72-242-62.ec2.internal,60020,1352160397949
12/11/05 19:13:55 INFO hbase.HBaseCluster: Aborting RS: 
ip-10-72-242-62.ec2.internal,60020,1352160397949
12/11/05 19:13:55 INFO hbase.ClusterManager: Executing remote command: ps aux | 
grep regionserver | grep -v grep | tr -s ' ' | cut -d ' ' -f2 | xargs kill -s 
SIGKILL , hostname:ip-10-72-242-62.ec2.internal
12/11/05 19:13:55 INFO hbase.ClusterManager: Executed remote command, exit 
code:0 , output:
12/11/05 19:13:55 INFO hbase.HBaseCluster: Waiting service:regionserver to 
stop: ip-10-72-242-62.ec2.internal,60020,1352160397949
12/11/05 19:13:55 INFO hbase.ClusterManager: Executing remote command: ps aux | 
grep regionserver | grep -v grep | tr -s ' ' | cut -d ' ' -f2 , 
hostname:ip-10-72-242-62.ec2.internal
12/11/05 19:13:55 INFO hbase.ClusterManager: Executed remote command, exit 
code:0 , output:
12/11/05 19:13:55 INFO util.ChaosMonkey: Killed region 
server:ip-10-72-242-62.ec2.internal,60020,1352160397949. Reported num of rs:2
12/11/05 19:13:55 INFO util.ChaosMonkey: Sleeping for:5000
12/11/05 19:14:00 INFO util.ChaosMonkey: Starting region 
server:ip-10-72-242-62.ec2.internal
12/11/05 19:14:00 INFO hbase.HBaseCluster: Starting RS on: 
ip-10-72-242-62.ec2.internal
12/11/05 19:14:00 INFO hbase.ClusterManager: Executing remote command: 
/root/hbase/bin/../bin/hbase-daemon.sh --config /root/hbase/bin/../conf start 
regionserver , hostname:ip-10-72-242-62.ec2.internal
12/11/05 19:14:02 INFO hbase.ClusterManager: Executed remote command, exit 
code:0 , output:starting regionserver, logging to 
/var/log/hbase/hbase-root-regionserver-ip-10-72-242-62.out

12/11/05 19:14:02 INFO util.ChaosMonkey: Started region 
server:ip-10-72-242-62.ec2.internal,60020,1352160397949. Reported num of rs:2
....
{code}

The only problem is when master is restarted, HConnection does not seem to pick 
up the new master:
{code}
12/11/05 19:26:00 INFO util.ChaosMonkey: Killed master 
server:ip-10-191-190-58.ec2.internal,60000,1352160574752
12/11/05 19:26:00 INFO util.ChaosMonkey: Sleeping for:5000
12/11/05 19:26:05 INFO util.ChaosMonkey: Starting 
master:ip-10-191-190-58.ec2.internal
12/11/05 19:26:05 INFO hbase.HBaseCluster: Starting Master on: 
ip-10-191-190-58.ec2.internal
12/11/05 19:26:05 INFO hbase.ClusterManager: Executing remote command: 
/root/hbase/bin/../bin/hbase-daemon.sh --config /root/hbase/bin/../conf start 
master , hostname:ip-10-191-190-58.ec2.internal
12/11/05 19:26:06 INFO hbase.ClusterManager: Executed remote command, exit 
code:0 , output:starting master, logging to 
/var/log/hbase/hbase-root-master-ip-10-191-190-58.out

12/11/05 19:26:06 INFO client.HConnectionManager$HConnectionImplementation: 
Exception contacting master. Retrying...
java.io.IOException: Call to ip-10-191-190-58.ec2.internal/10.191.190.58:60000 
failed on local exception: java.io.EOFException
12/11/05 19:27:06 WARN hbase.HBaseCluster: Master not started yet 
org.apache.hadoop.hbase.MasterNotRunningException
12/11/05 19:27:07 INFO util.ChaosMonkey: Started master: 
ip-10-191-190-58.ec2.internal,60000,1352160574752
12/11/05 19:27:07 INFO util.ChaosMonkey: Performing action: Batch restarting 
50% of region servers
12/11/05 19:27:07 WARN util.ChaosMonkey: Exception occured during performing 
action: org.apache.hadoop.hbase.MasterNotRunningException
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:713)
        at 
org.apache.hadoop.hbase.client.HBaseAdmin.getMaster(HBaseAdmin.java:213)
        at 
org.apache.hadoop.hbase.client.HBaseAdmin.getClusterStatus(HBaseAdmin.java:1632)
        at 
org.apache.hadoop.hbase.DistributedHBaseCluster.getClusterStatus(DistributedHBaseCluster.java:68)
        at 
org.apache.hadoop.hbase.util.ChaosMonkey$Action.getCurrentServers(ChaosMonkey.java:141)
        at 
org.apache.hadoop.hbase.util.ChaosMonkey$BatchRestartRs.perform(ChaosMonkey.java:277)
        at 
org.apache.hadoop.hbase.util.ChaosMonkey$PeriodicRandomActionPolicy.run(ChaosMonkey.java:393)
        at java.lang.Thread.run(Thread.java:662)
{code}

Not sure whether there is a problem in the backported patch, or in 0.94.3 
itself. Investigating now. 
                
> Port HBaseCluster interface/tests to 0.94
> -----------------------------------------
>
>                 Key: HBASE-7009
>                 URL: https://issues.apache.org/jira/browse/HBASE-7009
>             Project: HBase
>          Issue Type: Sub-task
>          Components: test
>    Affects Versions: 0.94.3
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>             Fix For: 0.94.4
>
>         Attachments: HBASE-7009-p1.patch, HBASE-7009.patch, 
> HBASE-7009-v2-squashed.patch
>
>
> Need to port. I am porting V5 patch from the original JIRA; I have a 
> partially ported (V3) patch from Enis with protocol buffers being reverted to 
> HRegionInterface/HMasterInterface

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to