Dmitry Lysnichenko created AMBARI-22060: -------------------------------------------
Summary: Fail to restart Ranger Admin during HDP downgrade. Key: AMBARI-22060 URL: https://issues.apache.org/jira/browse/AMBARI-22060 Project: Ambari Issue Type: Bug Reporter: Dmitry Lysnichenko Assignee: Dmitry Lysnichenko Priority: Critical During the downgrade process, run into the following error whilst it's restating Ranger Admin: {code} Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/RANGER/0.4.0/package/scripts/ranger_admin.py", line 216, in RangerAdmin().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 329, in execute method(env) File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 850, in restart self.start(env, upgrade_type=upgrade_type) File "/var/lib/ambari-agent/cache/common-services/RANGER/0.4.0/package/scripts/ranger_admin.py", line 93, in start setup_ranger_audit_solr() File "/var/lib/ambari-agent/cache/common-services/RANGER/0.4.0/package/scripts/setup_ranger_xml.py", line 705, in setup_ranger_audit_solr new_service_principals = [params.ranger_admin_jaas_principal]) File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/solr_cloud_util.py", line 329, in add_solr_roles new_service_users.append(__remove_host_from_principal(new_service_user, kerberos_realm)) File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/solr_cloud_util.py", line 266, in __remove_host_from_principal if not realm: File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/config_dictionary.py", line 73, in __getattr__ raise Fail("Configuration parameter '" + self.name + "' was not found in configurations dictionary!") resource_management.core.exceptions.Fail: Configuration parameter 'kerberos-env' was not found in configurations dictionary! {code} The reason was that server did not have many configs selected, and did not send them to agent during downgrade. There are few issues here: - During upgrade from 2.4 to 2.5, finalize did not update current cluster version. As a result config helpers have gone mad - As a result of previous issue, some Configure tasks failed to execute - During downgrade from 2.6 , looks like cluster entity DB state was not consistent after config selection, so sometimes configs were not selected is some cases. I managed to reproduce that only once, it's a race condition that is very hard to catch/trace in debugger. -- This message was sent by Atlassian JIRA (v6.4.14#64029)