[ 
https://issues.apache.org/jira/browse/AMBARI-9022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rishi Pidva reassigned AMBARI-9022:
-----------------------------------

    Assignee: Rishi Pidva

The issue is on UI side, with a potential improvement/fix on Agent code for 
Oozie.

Will be submitting it soon.

> Kerberos config lost + Cluster outage after adding Kafka service or Oozie 
> service (or any service?)
> ---------------------------------------------------------------------------------------------------
>
>                 Key: AMBARI-9022
>                 URL: https://issues.apache.org/jira/browse/AMBARI-9022
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-agent, ambari-server, security
>    Affects Versions: 1.7.0
>         Environment: HDP 2.2
>            Reporter: Hari Sekhon
>            Assignee: Rishi Pidva
>            Priority: Blocker
>
> Adding the Kafka service to an existing kerberized HDP 2.2 cluster resulted 
> in all the Kerberos fields in core-site.xml getting blank or literal "null" 
> string which prevented all the HDFS and Yarn instances from restarting. This 
> caused a major outage - lucky this cluster isn't prod but this is going to 
> bite somebody badly.
> Error observed in NameNode log:
> {code}2015-01-07 09:56:01,958 INFO  namenode.NameNode 
> (NameNode.java:setClientNamenodeAddress(369)) - Clients are to use 
> nameservice1 to access this namenode/service.
> 2015-01-07 09:56:02,055 FATAL namenode.NameNode (NameNode.java:main(1509)) - 
> Failed to start namenode.
> java.lang.IllegalArgumentException: Invalid rule: null
>         at 
> org.apache.hadoop.security.authentication.util.KerberosName.parseRules(KerberosName.java:331)
>         at 
> org.apache.hadoop.security.authentication.util.KerberosName.setRules(KerberosName.java:397)
>         at 
> org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:75)
>         at 
> org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:263)
>         at 
> org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:299)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:583)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:762)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:746)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1438)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1504)
> 2015-01-07 09:56:02,062 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - 
> Exiting with status 1
> 2015-01-07 09:56:02,064 INFO  namenode.NameNode (StringUtils.java:run(659)) - 
> SHUTDOWN_MSG:{code}
> Fields which ended up being with "null" string literals in the value field in 
> core-site.xml: {code}hadoop.http.authentication.kerberos.keytab
> hadoop.http.authentication.kerberos.principal
> hadoop.security.auth_to_local{code}
> Fields which ended up being blank ("") for value field in core-site.xml:
> {code}hadoop.http.authentication.cookie.domain
> hadoop.http.authentication.cookie.path
> hadoop.http.authentication.kerberos.name.rules
> hadoop.http.authentication.signature.secret
> hadoop.http.authentication.signature.secret.file
> hadoop.http.authentication.signer.secret.provider
> hadoop.http.authentication.signer.secret.provider.object
> hadoop.http.authentication.token.validity
> hadoop.http.filter.initializers{code}
> Previous revisions showed undefined which was definitely not the case for 
> past months this was a working fully kerberized cluster.
> Removing the Kafka service via rest API calls and restarting ambari-server 
> didn't make the config reappear either.
> I had to de-kerberize cluster and re-kerberize the whole cluster in Ambari in 
> order to get all those 12 configuration settings re-populated.
> A remaining side effect of this bug even after recovering the cluster is that 
> all the previous config revisions are now ruined due to the many undefined 
> values that would prevent the cluster from starting and are therefore no 
> longer viable as a backup to revert to for any reason. There doesn't seem to 
> be much I can workaround that.
> Ironically the kafka brokers started up fine after ruining all the core 
> components since Kafka has no security itself.
> Regards,
> Hari Sekhon
> http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to