[
https://issues.apache.org/jira/browse/HBASE-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491099#comment-13491099
]
Enis Soztutar commented on HBASE-7009:
--------------------------------------
@Jimmy, are you referring to MiniHBaseCluster.getClientProtocol(). Is there any
reason, why adding this would break BC?
Tested the patch:
{code}
[root@ip-10-191-190-58 hbase]# bin/hbase
org.apache.hadoop.hbase.util.ChaosMonkey
12/11/05 19:13:37 INFO util.ChaosMonkey: Sleeping for 17573 to add jitter
12/11/05 19:13:55 INFO util.ChaosMonkey: Performing action: Restart random
region server
12/11/05 19:13:55 INFO util.ChaosMonkey: Killing region
server:ip-10-72-242-62.ec2.internal,60020,1352160397949
12/11/05 19:13:55 INFO hbase.HBaseCluster: Aborting RS:
ip-10-72-242-62.ec2.internal,60020,1352160397949
12/11/05 19:13:55 INFO hbase.ClusterManager: Executing remote command: ps aux |
grep regionserver | grep -v grep | tr -s ' ' | cut -d ' ' -f2 | xargs kill -s
SIGKILL , hostname:ip-10-72-242-62.ec2.internal
12/11/05 19:13:55 INFO hbase.ClusterManager: Executed remote command, exit
code:0 , output:
12/11/05 19:13:55 INFO hbase.HBaseCluster: Waiting service:regionserver to
stop: ip-10-72-242-62.ec2.internal,60020,1352160397949
12/11/05 19:13:55 INFO hbase.ClusterManager: Executing remote command: ps aux |
grep regionserver | grep -v grep | tr -s ' ' | cut -d ' ' -f2 ,
hostname:ip-10-72-242-62.ec2.internal
12/11/05 19:13:55 INFO hbase.ClusterManager: Executed remote command, exit
code:0 , output:
12/11/05 19:13:55 INFO util.ChaosMonkey: Killed region
server:ip-10-72-242-62.ec2.internal,60020,1352160397949. Reported num of rs:2
12/11/05 19:13:55 INFO util.ChaosMonkey: Sleeping for:5000
12/11/05 19:14:00 INFO util.ChaosMonkey: Starting region
server:ip-10-72-242-62.ec2.internal
12/11/05 19:14:00 INFO hbase.HBaseCluster: Starting RS on:
ip-10-72-242-62.ec2.internal
12/11/05 19:14:00 INFO hbase.ClusterManager: Executing remote command:
/root/hbase/bin/../bin/hbase-daemon.sh --config /root/hbase/bin/../conf start
regionserver , hostname:ip-10-72-242-62.ec2.internal
12/11/05 19:14:02 INFO hbase.ClusterManager: Executed remote command, exit
code:0 , output:starting regionserver, logging to
/var/log/hbase/hbase-root-regionserver-ip-10-72-242-62.out
12/11/05 19:14:02 INFO util.ChaosMonkey: Started region
server:ip-10-72-242-62.ec2.internal,60020,1352160397949. Reported num of rs:2
....
{code}
The only problem is when master is restarted, HConnection does not seem to pick
up the new master:
{code}
12/11/05 19:26:00 INFO util.ChaosMonkey: Killed master
server:ip-10-191-190-58.ec2.internal,60000,1352160574752
12/11/05 19:26:00 INFO util.ChaosMonkey: Sleeping for:5000
12/11/05 19:26:05 INFO util.ChaosMonkey: Starting
master:ip-10-191-190-58.ec2.internal
12/11/05 19:26:05 INFO hbase.HBaseCluster: Starting Master on:
ip-10-191-190-58.ec2.internal
12/11/05 19:26:05 INFO hbase.ClusterManager: Executing remote command:
/root/hbase/bin/../bin/hbase-daemon.sh --config /root/hbase/bin/../conf start
master , hostname:ip-10-191-190-58.ec2.internal
12/11/05 19:26:06 INFO hbase.ClusterManager: Executed remote command, exit
code:0 , output:starting master, logging to
/var/log/hbase/hbase-root-master-ip-10-191-190-58.out
12/11/05 19:26:06 INFO client.HConnectionManager$HConnectionImplementation:
Exception contacting master. Retrying...
java.io.IOException: Call to ip-10-191-190-58.ec2.internal/10.191.190.58:60000
failed on local exception: java.io.EOFException
12/11/05 19:27:06 WARN hbase.HBaseCluster: Master not started yet
org.apache.hadoop.hbase.MasterNotRunningException
12/11/05 19:27:07 INFO util.ChaosMonkey: Started master:
ip-10-191-190-58.ec2.internal,60000,1352160574752
12/11/05 19:27:07 INFO util.ChaosMonkey: Performing action: Batch restarting
50% of region servers
12/11/05 19:27:07 WARN util.ChaosMonkey: Exception occured during performing
action: org.apache.hadoop.hbase.MasterNotRunningException
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:713)
at
org.apache.hadoop.hbase.client.HBaseAdmin.getMaster(HBaseAdmin.java:213)
at
org.apache.hadoop.hbase.client.HBaseAdmin.getClusterStatus(HBaseAdmin.java:1632)
at
org.apache.hadoop.hbase.DistributedHBaseCluster.getClusterStatus(DistributedHBaseCluster.java:68)
at
org.apache.hadoop.hbase.util.ChaosMonkey$Action.getCurrentServers(ChaosMonkey.java:141)
at
org.apache.hadoop.hbase.util.ChaosMonkey$BatchRestartRs.perform(ChaosMonkey.java:277)
at
org.apache.hadoop.hbase.util.ChaosMonkey$PeriodicRandomActionPolicy.run(ChaosMonkey.java:393)
at java.lang.Thread.run(Thread.java:662)
{code}
Not sure whether there is a problem in the backported patch, or in 0.94.3
itself. Investigating now.
> Port HBaseCluster interface/tests to 0.94
> -----------------------------------------
>
> Key: HBASE-7009
> URL: https://issues.apache.org/jira/browse/HBASE-7009
> Project: HBase
> Issue Type: Sub-task
> Components: test
> Affects Versions: 0.94.3
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Fix For: 0.94.4
>
> Attachments: HBASE-7009-p1.patch, HBASE-7009.patch,
> HBASE-7009-v2-squashed.patch
>
>
> Need to port. I am porting V5 patch from the original JIRA; I have a
> partially ported (V3) patch from Enis with protocol buffers being reverted to
> HRegionInterface/HMasterInterface
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira