[MediaWiki-commits] [Gerrit] Configure YARN HA ResourceManager - change (operations...cdh)

Ottomata (Code Review) Wed, 06 May 2015 07:17:49 -0700

Ottomata has submitted this change and it was merged.

Change subject: Configure YARN HA ResourceManager
......................................................................



Configure YARN HA ResourceManager

Change-Id: I614968e8892392bfa1f0cf6e579a1f79d931682a
---
M README.md
M TODO.md
M manifests/hadoop.pp
M manifests/hadoop/defaults.pp
M manifests/hadoop/master.pp
M manifests/hadoop/resourcemanager.pp
M templates/hadoop/yarn-site.xml.erb
7 files changed, 228 insertions(+), 32 deletions(-)

Approvals:
  Ottomata: Verified; Looks good to me, approved
  jenkins-bot: Verified



diff --git a/README.md b/README.md
index e8a1060..cd1ac8b 100644
--- a/README.md
+++ b/README.md
@@ -111,7 +111,9 @@
 and set up the NodeManager.  If using MRv1, this will install and set up the
 TaskTracker.
 
-## High Availability NameNode
+## High Availability
+
+### High Availibility NameNode
 
 For detailed documentation, see the
 [CDH5 High Availability 
Guide](http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-High-Availability-Guide/cdh5hag_hdfs_ha_config.html).
@@ -202,7 +204,7 @@
 3. StandBy NameNodes
 4. Worker nodes (DataNodes)
 
-### Adding High Availability to a running cluster
+#### Adding High Availability NameNode to a running cluster
 
 Go through all of the same steps as described in the above section.  Once all
 of your puppet manifests have been applied (JournalNodes running, NameNodes 
running and
@@ -259,6 +261,74 @@
 with dot ('.') characters replaced with dashes ('-').  E.g.  
```namenode1-domain-org```.
 
 
+### High Availability YARN ResourceManager
+To configure automatic failover for the ResourceManager, you'll need a running
+zookeeper cluster.  If both $resourcemanager_hosts (which defaults to the 
value you
+provide for $namenode_hosts) has multiple hosts set and $zookeeper_hosts is 
set, then yarn-site.xml
+will be configured to use HA ResourceManager.
+
+This module does not support running HA ResourceManager without also running
+HA NameNodes.  Your primary NameNode and primary ResourceManager must be 
configured
+to run on the same host via the inclusion of the ```cdh::hadoop::master``` 
class.
+Make sure that the first host listed in $namenode_hosts and in 
$resoucemanager_hosts
+is this primary node (namenode1.domain.org in the following example).
+
+```puppet
+class my::hadoop {
+    class { 'cdh::hadoop':
+        cluster_name        => 'mycluster',
+        zookeeper_hosts     => [
+            'zk1.domain.org:2181',
+            'zk2.domain.org:2181',
+            'zk3.domain.org:2181'
+        ],
+        namenode_hosts      => [
+            'namenode1.domain.org',
+            'namenode2.domain.org
+        ],
+        journalnode_hosts   => [
+            'datanode1.domain.org',
+            'datanode2.domain.org',
+            'datanode3.domain.org'
+        ],
+        datanode_mounts    => [
+            '/var/lib/hadoop/data/a',
+            '/var/lib/hadoop/data/b',
+            '/var/lib/hadoop/data/c'
+        ],
+        dfs_name_dir       => ['/var/lib/hadoop/name', '/mnt/hadoop_name'],
+    }
+}
+
+```
+
+Note the differences from the non-HA RM setup:
+
+- zookeeper_hosts has been provided.  This list of hosts will be used for auto 
failover of the RM.
+- On your standby ResourceManagers, explicitly include 
```cdh::hadoop::resourcemanager```.
+
+``` puppet
+class my::hadoop::master inherits my::hadoop {
+    include cdh::hadoop::master
+}
+class my::hadoop::standby inherits my::hadoop {
+    include cdh::hadoop::namenode::standby
+    include cdh::hadoop::resourcemanager
+}
+
+node 'namenode1.domain.org' {
+    include my::hadoop::master
+}
+
+node 'namenode2.domain.org' {
+    include my::hadoop::standby
+}
+```
+
+#### Adding High Availability YARN ResourceManager to a running cluster
+Apply the above puppetization to your nodes, and then restart all YARN 
services (ResouceManagers and NodeManagers).
+
+
 # Hive
 
 ## Hive Clients
@@ -266,7 +336,7 @@
 ```puppet
 class { 'cdh::hive':
   metastore_host  => 'hive-metastore-node.domain.org',
-  zookeeper_hosts => ['zk1.domain.org', 'zk2.domain.org'],
+  zookeeper_hosts => ['zk1.domain.org', 'zk2.domain.org', 'zk3.domain.org'],
   jdbc_password   => $secret_password,
 }
 ```
diff --git a/TODO.md b/TODO.md
index 4ad9c00..72acfb0 100644
--- a/TODO.md
+++ b/TODO.md
@@ -12,9 +12,9 @@
 - Make hadoop-metrics2.properties more configurable.
 - Support HA automatic failover.
 - HA NameNode Fencing support.
-- YARN HA
 - Create one variable for namenode address independent of nameservice_id and 
primary_namenode_host_
 - Spark History Server?
+- Impala documentation
 
 ## Zookeeper
 
diff --git a/manifests/hadoop.pp b/manifests/hadoop.pp
index d22fe21..e6fd118 100644
--- a/manifests/hadoop.pp
+++ b/manifests/hadoop.pp
@@ -31,6 +31,15 @@
 #   $datanode_mounts            - Array of JBOD mount points.  Hadoop datanode 
and
 #                                 mapreduce/yarn directories will be here.
 #   $dfs_data_path              - Path relative to JBOD mount point for HDFS 
data directories.
+#
+#   $resourcemanager_hosts      - Array of hosts on which ResourceManager is 
running.  If this has
+#                                 more than one host in it AND 
$zookeeper_hosts is set, HA YARN ResourceManager
+#                                 and automatic failover will be enabled.  
This defaults to the value provided
+#                                 for $namenode_hosts.  Please be sure to 
include cdh::hadoop::resourcemanager
+#                                 directly on any standby RM hosts.  (The 
master RM will be included automatically)
+#                                 when you include cdh::hadoop::master).
+#   $zookeeper_hosts            - Array of Zookeeper hosts to use for HA YARN 
ResouceManager.
+#                                 Default: ['localhost:2181'].
 #   $enable_jmxremote           - enables remote JMX connections for all 
Hadoop services.
 #                                 Ports are not currently configurable.  
Default: true.
 #   $yarn_local_path            - Path relative to JBOD mount point for yarn 
local directories.
@@ -114,6 +123,9 @@
     $datanode_mounts                             = 
$::cdh::hadoop::defaults::datanode_mounts,
     $dfs_data_path                               = 
$::cdh::hadoop::defaults::dfs_data_path,
 
+    $resourcemanager_hosts                       = $namenode_hosts,
+    $zookeeper_hosts                             = 
$::cdh::hadoop::defaults::zookeeper_hosts,
+
     $yarn_local_path                             = 
$::cdh::hadoop::defaults::yarn_local_path,
     $yarn_logs_path                              = 
$::cdh::hadoop::defaults::yarn_logs_path,
     $dfs_block_size                              = 
$::cdh::hadoop::defaults::dfs_block_size,
@@ -164,6 +176,10 @@
     # This used in a couple of execs throughout this module.
     $dfs_name_dir_main = inline_template('<%= (@dfs_name_dir.class == Array) ? 
@dfs_name_dir[0] : @dfs_name_dir %>')
 
+    # Config files are installed into a directory
+    # based on the value of $cluster_name.
+    $config_directory = "/etc/hadoop/conf.${cluster_name}"
+
     # Set a boolean used to indicate that HA NameNodes
     # are intended to be used for this cluster.  HA NameNodes
     # require the JournalNodes are configured.
@@ -171,22 +187,11 @@
         undef   => false,
         default => true,
     }
-
     # If $ha_enabled is true, use $cluster_name as $nameservice_id.
     $nameservice_id = $ha_enabled ? {
         true    => $cluster_name,
         default => undef,
     }
-
-    # Config files are installed into a directory
-    # based on the value of $cluster_name.
-    $config_directory = "/etc/hadoop/conf.${cluster_name}"
-
-    # Parameter Validation:
-    if ($ha_enabled and !$journalnode_hosts) {
-        fail('Must provide multiple $journalnode_hosts when using HA and 
setting $nameservice_id.')
-    }
-
     # Assume the primary namenode is the first entry in $namenode_hosts,
     # Set a variable here for reference in other classes.
     $primary_namenode_host = $namenode_hosts[0]
@@ -197,6 +202,25 @@
     # which are '.' delimited.
     $primary_namenode_id   = inline_template('<%= 
@primary_namenode_host.tr(\'.\', \'-\') %>')
 
+
+    # Set a boolean used to indicate that HA YARN
+    # is intended to be used for this cluster.  HA YARN
+    # require the zookeeper is configured, and that
+    # multiple ResourceManagers are specificed.
+    if $ha_enabled and size($resourcemanager_hosts) > 1 and $zookeeper_hosts {
+        $yarn_ha_enabled = true
+        $yarn_cluster_id = $cluster_name
+    }
+    else {
+        $yarn_ha_enabled = false
+        $yarn_cluster_id = undef
+    }
+
+    # Assume the primary resourcemanager is the first entry in 
$resourcemanager_hosts
+    # Set a variable here for reference in other classes.
+    $primary_resourcemanager_host = $resourcemanager_hosts[0]
+
+
     package { 'hadoop-client':
         ensure => 'installed'
     }
diff --git a/manifests/hadoop/defaults.pp b/manifests/hadoop/defaults.pp
index 602ceaf..8020c0c 100644
--- a/manifests/hadoop/defaults.pp
+++ b/manifests/hadoop/defaults.pp
@@ -8,6 +8,11 @@
 
     $datanode_mounts                          = undef
     $dfs_data_path                            = 'hdfs/dn'
+
+    # $resourcemanager_hosts is not set here, because it defaults to the user
+    # provided value of $namenode_hosts in hadoop.pp.
+    $zookeeper_hosts                          = undef
+
     $yarn_local_path                          = 'yarn/local'
     $yarn_logs_path                           = 'yarn/logs'
     $dfs_block_size                           = 67108864 # 64MB default
diff --git a/manifests/hadoop/master.pp b/manifests/hadoop/master.pp
index 9e3e368..eafa777 100644
--- a/manifests/hadoop/master.pp
+++ b/manifests/hadoop/master.pp
@@ -2,8 +2,10 @@
 # Wrapper class for Hadoop master node services:
 # - NameNode
 # - ResourceManager and HistoryServer (YARN)
-# OR
-# - JobTracker (MRv1).
+#
+# This requires that you run your primary NameNode and
+# primary ResourceManager on the same host.  Standby services
+# can be spread on any nodes.
 #
 class cdh::hadoop::master {
     Class['cdh::hadoop'] -> Class['cdh::hadoop::master']
diff --git a/manifests/hadoop/resourcemanager.pp 
b/manifests/hadoop/resourcemanager.pp
index 679a60e..f328583 100644
--- a/manifests/hadoop/resourcemanager.pp
+++ b/manifests/hadoop/resourcemanager.pp
@@ -3,25 +3,32 @@
 # This will create YARN HDFS directories.
 #
 class cdh::hadoop::resourcemanager {
-    Class['cdh::hadoop::namenode'] -> Class['cdh::hadoop::resourcemanager']
+    Class['cdh::hadoop'] -> Class['cdh::hadoop::resourcemanager']
 
-    # Create YARN HDFS directories.
-    # See: 
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-Installation-Guide/cdh5ig_yarn_cluster_deploy.html?scroll=topic_11_4_10_unique_1
-    cdh::hadoop::directory { '/var/log/hadoop-yarn':
-        # sudo -u hdfs hdfs dfs -mkdir /var/log/hadoop-yarn
-        # sudo -u hdfs hdfs dfs -chown yarn:mapred /var/log/hadoop-yarn
-        owner   => 'yarn',
-        group   => 'mapred',
-        mode    => '0755',
-        # Make sure HDFS directories are created before
-        # resourcemanager is installed and started, but after
-        # the namenode.
-        require => [Service['hadoop-hdfs-namenode'], 
Cdh::Hadoop::Directory['/var/log']],
+    # In an HA YARN ResourceManager setup, this class will be included on 
multiple nodes.
+    # In order to have this directory check performed by only one 
resourcemanager,
+    # we only use it on the first node in the $resourcemanager_hosts array.
+    # This means that the Hadoop Master NameNode must be the same node as the
+    # Hadoop Master ResouceManager.
+    if !$::cdh::hadoop::yarn_ha_enabled or $::fqdn == 
$::cdh::hadoop::primary_resourcemanager_host {
+        # Create YARN HDFS directories.
+        # See: 
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-Installation-Guide/cdh5ig_yarn_cluster_deploy.html?scroll=topic_11_4_10_unique_1
+        cdh::hadoop::directory { '/var/log/hadoop-yarn':
+            # sudo -u hdfs hdfs dfs -mkdir /var/log/hadoop-yarn
+            # sudo -u hdfs hdfs dfs -chown yarn:mapred /var/log/hadoop-yarn
+            owner   => 'yarn',
+            group   => 'mapred',
+            mode    => '0755',
+            # Make sure HDFS directories are created before
+            # resourcemanager is installed and started, but after
+            # the namenode.
+            require => [Service['hadoop-hdfs-namenode'], 
Cdh::Hadoop::Directory['/var/log']],
+            before  => Package['hadoop-yarn-resourcemanager'],
+        }
     }
 
     package { 'hadoop-yarn-resourcemanager':
         ensure  => 'installed',
-        require => Cdh::Hadoop::Directory['/var/log/hadoop-yarn'],
     }
 
     service { 'hadoop-yarn-resourcemanager':
diff --git a/templates/hadoop/yarn-site.xml.erb 
b/templates/hadoop/yarn-site.xml.erb
index 7548889..a94f089 100644
--- a/templates/hadoop/yarn-site.xml.erb
+++ b/templates/hadoop/yarn-site.xml.erb
@@ -1,3 +1,13 @@
+<%
+# Convert a hostname to a Node ID.
+# We can't use '.' characters because IDs.
+# will be used in the names of some Java properties,
+# which are '.' delimited.
+def host_to_id(host)
+  host.tr('.', '-')
+end
+
+-%>
 <?xml version="1.0"?>
 <!-- NOTE:  This file is managed by Puppet. -->
 
@@ -7,10 +17,88 @@
 
 <configuration>
 
+<% if @yarn_ha_enabled -%>
+  <property>
+    <name>yarn.resourcemanager.cluster-id</name>
+    <value><%= @yarn_cluster_id %></value>
+  </property>
+
+  <property>
+    <name>yarn.resourcemanager.ha.rm-ids</name>
+    <value><%= @resourcemanager_hosts.sort.collect { |host| host_to_id(host) 
}.join(',') %></value>
+  </property>
+
+<% if @resourcemanager_hosts.include?(@fqdn) -%>
+  <property>
+    <name>yarn.resourcemanager.ha.id</name>
+    <value><%= host_to_id(@fqdn) %></value>
+  </property>
+<% end -%>
+
+  <property>
+    <name>yarn.resourcemanager.connect.retry-interval.ms</name>
+    <value>2000</value>
+  </property>
+
+  <property>
+    <name>yarn.resourcemanager.ha.enabled</name>
+    <value>true</value>
+  </property>
+
+  <property>
+    <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
+    <value>true</value>
+  </property>
+
+  <property>
+    <name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
+    <value>true</value>
+  </property>
+
+  <property>
+    <name>yarn.resourcemanager.recovery.enabled</name>
+    <value>true</value>
+  </property>
+
+  <property>
+    <name>yarn.resourcemanager.store.class</name>
+    
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
+  </property>
+
+  <property>
+    <name>yarn.resourcemanager.zk-address</name>
+    <value><%= Array(@zookeeper_hosts).sort.join(',') %></value>
+  </property>
+
+  <property>
+    <name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
+    <value>5000</value>
+  </property>
+
+  <property>
+    <name>yarn.resourcemanager.work-preserving-recovery.enabled</name>
+    <value>true</value>
+  </property>
+
+  <property>
+    <name>yarn.resourcemanager.am.max-attempts</name>
+    <value>6</value>
+  </property>
+
+<% @resourcemanager_hosts.sort.each do |host| -%>
+  <property>
+    <name>yarn.resourcemanager.hostname.<%= host_to_id(host) %></name>
+    <value><%= host %></value>
+  </property>
+<% end # @resourcemanager_hosts.each -%>
+
+<% else -%>
   <property>
     <name>yarn.resourcemanager.hostname</name>
-    <value><%= @primary_namenode_host %></value>
+    <value><%= @primary_resourcemanager_host %></value>
   </property>
+<% end # if @yarn_ha_enabled -%>
+
 
 <% if @fair_scheduler_enabled -%>
   <property>

-- 
To view, visit https://gerrit.wikimedia.org/r/209019
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I614968e8892392bfa1f0cf6e579a1f79d931682a
Gerrit-PatchSet: 8
Gerrit-Project: operations/puppet/cdh
Gerrit-Branch: master
Gerrit-Owner: Ottomata <o...@wikimedia.org>
Gerrit-Reviewer: Ottomata <o...@wikimedia.org>
Gerrit-Reviewer: jenkins-bot <>

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

[MediaWiki-commits] [Gerrit] Configure YARN HA ResourceManager - change (operations...cdh)

Reply via email to