Duo Zhang created HBASE-29400:
---------------------------------

             Summary: RollingBatchRestartRsAction may fail to start region 
server
                 Key: HBASE-29400
                 URL: https://issues.apache.org/jira/browse/HBASE-29400
             Project: HBase
          Issue Type: Improvement
          Components: integration tests
            Reporter: Duo Zhang


{noformat}
2025-06-17T02:56:52,093 INFO  [ChaosMonkey-2 {}] 
actions.RollingBatchRestartRsAction: Killing regionserver 
data04,16020,1750098538006
2025-06-17T02:56:52,093 INFO  [ChaosMonkey-2 {}] hbase.DistributedHBaseCluster: 
Aborting RS: data04,16020,1750098538006
2025-06-17T02:56:52,093 INFO  [ChaosMonkey-2 {}] hbase.HBaseClusterManager: 
Executing remote command: ps ux | grep proc_regionserver | grep -v grep | tr -s 
' ' | cut -d ' ' -f2 | xargs kill -s SIGKILL, hostname:data04
2025-06-17T02:56:52,093 INFO  [ChaosMonkey-2 {}] util.Shell: Executing full 
command [timeout 30 /usr/bin/ssh -i /home/zhangduo/.ssh/id_rsa_cluster  -o 
ConnectTimeout=10 data04 "source /etc/profile; 
HBASE_CONF_DIR=/data/conf/hbase/conf setsid ps ux | grep proc_regionserver | 
grep -v grep | tr -s ' ' | cut -d ' ' -f2 | xargs kill -s SIGKILL"]
2025-06-17T02:56:52,544 INFO  [ChaosMonkey-2 {}] hbase.HBaseClusterManager: 
Executed remote command, exit code:0 , output:
2025-06-17T02:56:52,544 INFO  [ChaosMonkey-2 {}] hbase.DistributedHBaseCluster: 
Waiting for service: regionserver to stop: data04,16020,1750098538006
2025-06-17T02:56:52,544 INFO  [ChaosMonkey-2 {}] hbase.HBaseClusterManager: 
Executing remote command: ps ux | grep proc_regionserver | grep -v grep | tr -s 
' ' | cut -d ' ' -f2, hostname:data04
2025-06-17T02:56:52,544 INFO  [ChaosMonkey-2 {}] util.Shell: Executing full 
command [timeout 30 /usr/bin/ssh -i /home/zhangduo/.ssh/id_rsa_cluster  -o 
ConnectTimeout=10 data04 "source /etc/profile; 
HBASE_CONF_DIR=/data/conf/hbase/conf setsid ps ux | grep proc_regionserver | 
grep -v grep | tr -s ' ' | cut -d ' ' -f2"]
2025-06-17T02:56:52,803 INFO  [ChaosMonkey-2 {}] hbase.HBaseClusterManager: 
Executed remote command, exit code:0 , output:
2025-06-17T02:56:52,809 INFO  [ChaosMonkey-2 {}] 
actions.RollingBatchRestartRsAction: Killed regionserver 
data04,16020,1750098538006. Reported num of rs:5
2025-06-17T02:56:52,809 INFO  [ChaosMonkey-2 {}] 
actions.RollingBatchRestartRsAction: Sleeping for:354
2025-06-17T02:56:53,163 INFO  [ChaosMonkey-2 {}] 
actions.RollingBatchRestartRsAction: Starting regionserver data04:16020
2025-06-17T02:56:53,163 INFO  [ChaosMonkey-2 {}] hbase.DistributedHBaseCluster: 
Starting RS on: data04
2025-06-17T02:56:53,163 INFO  [ChaosMonkey-2 {}] hbase.HBaseClusterManager: 
Executing remote command: 
/home/zhangduo/packages/hbase/hbase/bin/hbase-daemon.sh  start regionserver, 
hostname:data04
2025-06-17T02:56:53,163 INFO  [ChaosMonkey-2 {}] util.Shell: Executing full 
command [timeout 30 /usr/bin/ssh -i /home/zhangduo/.ssh/id_rsa_cluster  -o 
ConnectTimeout=10 data04 "source /etc/profile; 
HBASE_CONF_DIR=/data/conf/hbase/conf setsid 
/home/zhangduo/packages/hbase/hbase/bin/hbase-daemon.sh  start regionserver"]
2025-06-17T02:56:53,473 INFO  [ChaosMonkey-2 {}] hbase.HBaseClusterManager: 
Executed remote command, exit code:0 , output:regionserver running as process 
1948033. Stop it first.
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to