[ 
https://issues.apache.org/jira/browse/AMBARI-9022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14318438#comment-14318438
 ] 

John Speidel commented on AMBARI-9022:
--------------------------------------

This had been several days ago and I have not tested again.
I had also made some small changes to my patch, but it is unlikely that these 
changes would have any affect on this.

No, I don't have any logs but can describe the steps that I took.
- Install a 1 node cluster using the following blueprint (with my patch in 
place)
{   
  "host_groups" : [
    {
      "name" : "host_group_1",
      "components" : [      
        {
          "name" : "NODEMANAGER"
        },
        {
          "name" : "NAMENODE"
        },
        {
          "name" : "HISTORYSERVER"
        },
        {
          "name" : "ZOOKEEPER_SERVER"
        },
        {
          "name" : "SECONDARY_NAMENODE"
        },
        {
          "name" : "RESOURCEMANAGER"
        },  
        {
          "name" : "APP_TIMELINE_SERVER"
        },        
        {
          "name" : "DATANODE"
        },
        {
          "name" : "YARN_CLIENT"
        },
        {
          "name" : "ZOOKEEPER_CLIENT"
        },
        {
          "name" : "MAPREDUCE2_CLIENT"
        }     
      ],
      "cardinality" : "1"
    }
  ],
  "Blueprints" : {
    "stack_name" : "HDP",
    "stack_version" : "2.2"
  }
}

- manually unzip UnlimitedJCEPolicy
- manually install MIT KDC
- Using UI, kerberize the existing cluster
- Using the UI, add the Oozie service

OOZIE_SERVER failed to start and the exception that I had specified earlier was 
from the log that is exposed via the UI for the oozie start operation.


> Kerberos config lost + Cluster outage after adding Kafka service or Oozie 
> service (or any service?)
> ---------------------------------------------------------------------------------------------------
>
>                 Key: AMBARI-9022
>                 URL: https://issues.apache.org/jira/browse/AMBARI-9022
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-agent, ambari-server, security
>    Affects Versions: 1.7.0
>         Environment: HDP 2.2
>            Reporter: Hari Sekhon
>            Assignee: John Speidel
>            Priority: Blocker
>
> Adding the Kafka service to an existing kerberized HDP 2.2 cluster resulted 
> in all the Kerberos fields in core-site.xml getting blank or literal "null" 
> string which prevented all the HDFS and Yarn instances from restarting. This 
> caused a major outage - lucky this cluster isn't prod but this is going to 
> bite somebody badly.
> Error observed in NameNode log:
> {code}2015-01-07 09:56:01,958 INFO  namenode.NameNode 
> (NameNode.java:setClientNamenodeAddress(369)) - Clients are to use 
> nameservice1 to access this namenode/service.
> 2015-01-07 09:56:02,055 FATAL namenode.NameNode (NameNode.java:main(1509)) - 
> Failed to start namenode.
> java.lang.IllegalArgumentException: Invalid rule: null
>         at 
> org.apache.hadoop.security.authentication.util.KerberosName.parseRules(KerberosName.java:331)
>         at 
> org.apache.hadoop.security.authentication.util.KerberosName.setRules(KerberosName.java:397)
>         at 
> org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:75)
>         at 
> org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:263)
>         at 
> org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:299)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:583)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:762)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:746)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1438)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1504)
> 2015-01-07 09:56:02,062 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - 
> Exiting with status 1
> 2015-01-07 09:56:02,064 INFO  namenode.NameNode (StringUtils.java:run(659)) - 
> SHUTDOWN_MSG:{code}
> Fields which ended up being with "null" string literals in the value field in 
> core-site.xml: {code}hadoop.http.authentication.kerberos.keytab
> hadoop.http.authentication.kerberos.principal
> hadoop.security.auth_to_local{code}
> Fields which ended up being blank ("") for value field in core-site.xml:
> {code}hadoop.http.authentication.cookie.domain
> hadoop.http.authentication.cookie.path
> hadoop.http.authentication.kerberos.name.rules
> hadoop.http.authentication.signature.secret
> hadoop.http.authentication.signature.secret.file
> hadoop.http.authentication.signer.secret.provider
> hadoop.http.authentication.signer.secret.provider.object
> hadoop.http.authentication.token.validity
> hadoop.http.filter.initializers{code}
> Previous revisions showed undefined which was definitely not the case for 
> past months this was a working fully kerberized cluster.
> Removing the Kafka service via rest API calls and restarting ambari-server 
> didn't make the config reappear either.
> I had to de-kerberize cluster and re-kerberize the whole cluster in Ambari in 
> order to get all those 12 configuration settings re-populated.
> A remaining side effect of this bug even after recovering the cluster is that 
> all the previous config revisions are now ruined due to the many undefined 
> values that would prevent the cluster from starting and are therefore no 
> longer viable as a backup to revert to for any reason. There doesn't seem to 
> be much I can workaround that.
> Ironically the kafka brokers started up fine after ruining all the core 
> components since Kafka has no security itself.
> Regards,
> Hari Sekhon
> http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to