[
https://issues.apache.org/jira/browse/AMBARI-24380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567837#comment-16567837
]
Akhil S Naik commented on AMBARI-24380:
---------------------------------------
Hi,
I had a similar issue where In Ambari HBase configuration option I accidentally
changed HBASE_REGIONSERVER_OPTS to have -Xmn option of -Xmn4096mm - note the
extra m.
I then asked ambari to do rolling HBase restarts with 2 servers at a time and
tolerate upto 2 failures. But ambari did take down all of my servers stating
each operation of restarting region servers are successfully restarted but
actually it dont, I see alerts on ambari saying region servers not up.
additionaly i tried mannually i found its not keeping up and below is error
messsage
{code:java}
[root@aanaik4 ambari-metrics-monitor]#
/usr/hdp/current/hbase-regionserver/bin/hbase-daemon.sh --config
/usr/hdp/current/hbase-regionserver/conf start regionserver
starting regionserver, logging to
/var/log/hbase/hbase-root-regionserver-aanaik4.openstacklocal.out
Error: VM option 'G1NewSizePercent' is experimental and must be enabled via
-XX:+UnlockExperimentalVMOptions.
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. The program will exit.
{code}
Root cause :
Ambari is not checking whether PID is created or not . its just sending execute
command ( start region server) and making the task success.
code reference:
https://github.com/apache/ambari/blob/79cca1c7184f1661236971dac70d85a83fab6c11/ambari-server/src/main/resources/common-services/HBASE/2.0.0.3.0/package/scripts/hbase_service.py#L42
{code:java}
try:
Execute ( daemon_cmd,
not_if = no_op_test,
user = params.hbase_user
)
except:
show_logs(params.log_dir, params.hbase_user)
raise
{code}
I hope my issue and yours are same.
Both will be fixed if there is some tweak in this part of code.
> Ambari HBase Rolling Restart failed to check RegionServers restarted
> successfully, continued to take down rest of RegionServers!
> --------------------------------------------------------------------------------------------------------------------------------
>
> Key: AMBARI-24380
> URL: https://issues.apache.org/jira/browse/AMBARI-24380
> Project: Ambari
> Issue Type: Bug
> Components: ambari-server
> Affects Versions: 2.5.2
> Reporter: Hari Sekhon
> Priority: Critical
>
> Ambari rolling-restart of HBase RegionServers failed to detect that
> RegionServers were not coming back online, continued to take down the rest of
> the RegionServers in the cluster.
> Adding the following JVM tuning setting to HBASE_OPTS in hbase-env.sh
> template near the start of the options:
> {code:java}
> -XX:G1NewSizePercent=3{code}
> before the following option (which was set a couple options further along, it
> needs to go after this option):
> {code:java}
> -XX:+UnlockExperimentalVMOptions{code}
> This resulted in both HMaster and RegionServer startup failures, but Ambari
> did not detect that the RegionServers were not coming back online, and
> proceeded to take down the rest of the RegionServers.
> Ambari should have checked that the first RegionServer restarted successfully
> and stayed up for the default 120 second rolling window before moving on to
> the second RegionServer.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)