[ https://issues.apache.org/jira/browse/AMBARI-9022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rishi Pidva reassigned AMBARI-9022: ----------------------------------- Assignee: Rishi Pidva The issue is on UI side, with a potential improvement/fix on Agent code for Oozie. Will be submitting it soon. > Kerberos config lost + Cluster outage after adding Kafka service or Oozie > service (or any service?) > --------------------------------------------------------------------------------------------------- > > Key: AMBARI-9022 > URL: https://issues.apache.org/jira/browse/AMBARI-9022 > Project: Ambari > Issue Type: Bug > Components: ambari-agent, ambari-server, security > Affects Versions: 1.7.0 > Environment: HDP 2.2 > Reporter: Hari Sekhon > Assignee: Rishi Pidva > Priority: Blocker > > Adding the Kafka service to an existing kerberized HDP 2.2 cluster resulted > in all the Kerberos fields in core-site.xml getting blank or literal "null" > string which prevented all the HDFS and Yarn instances from restarting. This > caused a major outage - lucky this cluster isn't prod but this is going to > bite somebody badly. > Error observed in NameNode log: > {code}2015-01-07 09:56:01,958 INFO namenode.NameNode > (NameNode.java:setClientNamenodeAddress(369)) - Clients are to use > nameservice1 to access this namenode/service. > 2015-01-07 09:56:02,055 FATAL namenode.NameNode (NameNode.java:main(1509)) - > Failed to start namenode. > java.lang.IllegalArgumentException: Invalid rule: null > at > org.apache.hadoop.security.authentication.util.KerberosName.parseRules(KerberosName.java:331) > at > org.apache.hadoop.security.authentication.util.KerberosName.setRules(KerberosName.java:397) > at > org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:75) > at > org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:263) > at > org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:299) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:583) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:762) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:746) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1438) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1504) > 2015-01-07 09:56:02,062 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - > Exiting with status 1 > 2015-01-07 09:56:02,064 INFO namenode.NameNode (StringUtils.java:run(659)) - > SHUTDOWN_MSG:{code} > Fields which ended up being with "null" string literals in the value field in > core-site.xml: {code}hadoop.http.authentication.kerberos.keytab > hadoop.http.authentication.kerberos.principal > hadoop.security.auth_to_local{code} > Fields which ended up being blank ("") for value field in core-site.xml: > {code}hadoop.http.authentication.cookie.domain > hadoop.http.authentication.cookie.path > hadoop.http.authentication.kerberos.name.rules > hadoop.http.authentication.signature.secret > hadoop.http.authentication.signature.secret.file > hadoop.http.authentication.signer.secret.provider > hadoop.http.authentication.signer.secret.provider.object > hadoop.http.authentication.token.validity > hadoop.http.filter.initializers{code} > Previous revisions showed undefined which was definitely not the case for > past months this was a working fully kerberized cluster. > Removing the Kafka service via rest API calls and restarting ambari-server > didn't make the config reappear either. > I had to de-kerberize cluster and re-kerberize the whole cluster in Ambari in > order to get all those 12 configuration settings re-populated. > A remaining side effect of this bug even after recovering the cluster is that > all the previous config revisions are now ruined due to the many undefined > values that would prevent the cluster from starting and are therefore no > longer viable as a backup to revert to for any reason. There doesn't seem to > be much I can workaround that. > Ironically the kafka brokers started up fine after ruining all the core > components since Kafka has no security itself. > Regards, > Hari Sekhon > http://www.linkedin.com/in/harisekhon -- This message was sent by Atlassian JIRA (v6.3.4#6332)