Ottomata has submitted this change and it was merged. Change subject: Configure YARN HA ResourceManager ......................................................................
Configure YARN HA ResourceManager Change-Id: I614968e8892392bfa1f0cf6e579a1f79d931682a --- M README.md M TODO.md M manifests/hadoop.pp M manifests/hadoop/defaults.pp M manifests/hadoop/master.pp M manifests/hadoop/resourcemanager.pp M templates/hadoop/yarn-site.xml.erb 7 files changed, 228 insertions(+), 32 deletions(-) Approvals: Ottomata: Verified; Looks good to me, approved jenkins-bot: Verified diff --git a/README.md b/README.md index e8a1060..cd1ac8b 100644 --- a/README.md +++ b/README.md @@ -111,7 +111,9 @@ and set up the NodeManager. If using MRv1, this will install and set up the TaskTracker. -## High Availability NameNode +## High Availability + +### High Availibility NameNode For detailed documentation, see the [CDH5 High Availability Guide](http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-High-Availability-Guide/cdh5hag_hdfs_ha_config.html). @@ -202,7 +204,7 @@ 3. StandBy NameNodes 4. Worker nodes (DataNodes) -### Adding High Availability to a running cluster +#### Adding High Availability NameNode to a running cluster Go through all of the same steps as described in the above section. Once all of your puppet manifests have been applied (JournalNodes running, NameNodes running and @@ -259,6 +261,74 @@ with dot ('.') characters replaced with dashes ('-'). E.g. ```namenode1-domain-org```. +### High Availability YARN ResourceManager +To configure automatic failover for the ResourceManager, you'll need a running +zookeeper cluster. If both $resourcemanager_hosts (which defaults to the value you +provide for $namenode_hosts) has multiple hosts set and $zookeeper_hosts is set, then yarn-site.xml +will be configured to use HA ResourceManager. + +This module does not support running HA ResourceManager without also running +HA NameNodes. Your primary NameNode and primary ResourceManager must be configured +to run on the same host via the inclusion of the ```cdh::hadoop::master``` class. +Make sure that the first host listed in $namenode_hosts and in $resoucemanager_hosts +is this primary node (namenode1.domain.org in the following example). + +```puppet +class my::hadoop { + class { 'cdh::hadoop': + cluster_name => 'mycluster', + zookeeper_hosts => [ + 'zk1.domain.org:2181', + 'zk2.domain.org:2181', + 'zk3.domain.org:2181' + ], + namenode_hosts => [ + 'namenode1.domain.org', + 'namenode2.domain.org + ], + journalnode_hosts => [ + 'datanode1.domain.org', + 'datanode2.domain.org', + 'datanode3.domain.org' + ], + datanode_mounts => [ + '/var/lib/hadoop/data/a', + '/var/lib/hadoop/data/b', + '/var/lib/hadoop/data/c' + ], + dfs_name_dir => ['/var/lib/hadoop/name', '/mnt/hadoop_name'], + } +} + +``` + +Note the differences from the non-HA RM setup: + +- zookeeper_hosts has been provided. This list of hosts will be used for auto failover of the RM. +- On your standby ResourceManagers, explicitly include ```cdh::hadoop::resourcemanager```. + +``` puppet +class my::hadoop::master inherits my::hadoop { + include cdh::hadoop::master +} +class my::hadoop::standby inherits my::hadoop { + include cdh::hadoop::namenode::standby + include cdh::hadoop::resourcemanager +} + +node 'namenode1.domain.org' { + include my::hadoop::master +} + +node 'namenode2.domain.org' { + include my::hadoop::standby +} +``` + +#### Adding High Availability YARN ResourceManager to a running cluster +Apply the above puppetization to your nodes, and then restart all YARN services (ResouceManagers and NodeManagers). + + # Hive ## Hive Clients @@ -266,7 +336,7 @@ ```puppet class { 'cdh::hive': metastore_host => 'hive-metastore-node.domain.org', - zookeeper_hosts => ['zk1.domain.org', 'zk2.domain.org'], + zookeeper_hosts => ['zk1.domain.org', 'zk2.domain.org', 'zk3.domain.org'], jdbc_password => $secret_password, } ``` diff --git a/TODO.md b/TODO.md index 4ad9c00..72acfb0 100644 --- a/TODO.md +++ b/TODO.md @@ -12,9 +12,9 @@ - Make hadoop-metrics2.properties more configurable. - Support HA automatic failover. - HA NameNode Fencing support. -- YARN HA - Create one variable for namenode address independent of nameservice_id and primary_namenode_host_ - Spark History Server? +- Impala documentation ## Zookeeper diff --git a/manifests/hadoop.pp b/manifests/hadoop.pp index d22fe21..e6fd118 100644 --- a/manifests/hadoop.pp +++ b/manifests/hadoop.pp @@ -31,6 +31,15 @@ # $datanode_mounts - Array of JBOD mount points. Hadoop datanode and # mapreduce/yarn directories will be here. # $dfs_data_path - Path relative to JBOD mount point for HDFS data directories. +# +# $resourcemanager_hosts - Array of hosts on which ResourceManager is running. If this has +# more than one host in it AND $zookeeper_hosts is set, HA YARN ResourceManager +# and automatic failover will be enabled. This defaults to the value provided +# for $namenode_hosts. Please be sure to include cdh::hadoop::resourcemanager +# directly on any standby RM hosts. (The master RM will be included automatically) +# when you include cdh::hadoop::master). +# $zookeeper_hosts - Array of Zookeeper hosts to use for HA YARN ResouceManager. +# Default: ['localhost:2181']. # $enable_jmxremote - enables remote JMX connections for all Hadoop services. # Ports are not currently configurable. Default: true. # $yarn_local_path - Path relative to JBOD mount point for yarn local directories. @@ -114,6 +123,9 @@ $datanode_mounts = $::cdh::hadoop::defaults::datanode_mounts, $dfs_data_path = $::cdh::hadoop::defaults::dfs_data_path, + $resourcemanager_hosts = $namenode_hosts, + $zookeeper_hosts = $::cdh::hadoop::defaults::zookeeper_hosts, + $yarn_local_path = $::cdh::hadoop::defaults::yarn_local_path, $yarn_logs_path = $::cdh::hadoop::defaults::yarn_logs_path, $dfs_block_size = $::cdh::hadoop::defaults::dfs_block_size, @@ -164,6 +176,10 @@ # This used in a couple of execs throughout this module. $dfs_name_dir_main = inline_template('<%= (@dfs_name_dir.class == Array) ? @dfs_name_dir[0] : @dfs_name_dir %>') + # Config files are installed into a directory + # based on the value of $cluster_name. + $config_directory = "/etc/hadoop/conf.${cluster_name}" + # Set a boolean used to indicate that HA NameNodes # are intended to be used for this cluster. HA NameNodes # require the JournalNodes are configured. @@ -171,22 +187,11 @@ undef => false, default => true, } - # If $ha_enabled is true, use $cluster_name as $nameservice_id. $nameservice_id = $ha_enabled ? { true => $cluster_name, default => undef, } - - # Config files are installed into a directory - # based on the value of $cluster_name. - $config_directory = "/etc/hadoop/conf.${cluster_name}" - - # Parameter Validation: - if ($ha_enabled and !$journalnode_hosts) { - fail('Must provide multiple $journalnode_hosts when using HA and setting $nameservice_id.') - } - # Assume the primary namenode is the first entry in $namenode_hosts, # Set a variable here for reference in other classes. $primary_namenode_host = $namenode_hosts[0] @@ -197,6 +202,25 @@ # which are '.' delimited. $primary_namenode_id = inline_template('<%= @primary_namenode_host.tr(\'.\', \'-\') %>') + + # Set a boolean used to indicate that HA YARN + # is intended to be used for this cluster. HA YARN + # require the zookeeper is configured, and that + # multiple ResourceManagers are specificed. + if $ha_enabled and size($resourcemanager_hosts) > 1 and $zookeeper_hosts { + $yarn_ha_enabled = true + $yarn_cluster_id = $cluster_name + } + else { + $yarn_ha_enabled = false + $yarn_cluster_id = undef + } + + # Assume the primary resourcemanager is the first entry in $resourcemanager_hosts + # Set a variable here for reference in other classes. + $primary_resourcemanager_host = $resourcemanager_hosts[0] + + package { 'hadoop-client': ensure => 'installed' } diff --git a/manifests/hadoop/defaults.pp b/manifests/hadoop/defaults.pp index 602ceaf..8020c0c 100644 --- a/manifests/hadoop/defaults.pp +++ b/manifests/hadoop/defaults.pp @@ -8,6 +8,11 @@ $datanode_mounts = undef $dfs_data_path = 'hdfs/dn' + + # $resourcemanager_hosts is not set here, because it defaults to the user + # provided value of $namenode_hosts in hadoop.pp. + $zookeeper_hosts = undef + $yarn_local_path = 'yarn/local' $yarn_logs_path = 'yarn/logs' $dfs_block_size = 67108864 # 64MB default diff --git a/manifests/hadoop/master.pp b/manifests/hadoop/master.pp index 9e3e368..eafa777 100644 --- a/manifests/hadoop/master.pp +++ b/manifests/hadoop/master.pp @@ -2,8 +2,10 @@ # Wrapper class for Hadoop master node services: # - NameNode # - ResourceManager and HistoryServer (YARN) -# OR -# - JobTracker (MRv1). +# +# This requires that you run your primary NameNode and +# primary ResourceManager on the same host. Standby services +# can be spread on any nodes. # class cdh::hadoop::master { Class['cdh::hadoop'] -> Class['cdh::hadoop::master'] diff --git a/manifests/hadoop/resourcemanager.pp b/manifests/hadoop/resourcemanager.pp index 679a60e..f328583 100644 --- a/manifests/hadoop/resourcemanager.pp +++ b/manifests/hadoop/resourcemanager.pp @@ -3,25 +3,32 @@ # This will create YARN HDFS directories. # class cdh::hadoop::resourcemanager { - Class['cdh::hadoop::namenode'] -> Class['cdh::hadoop::resourcemanager'] + Class['cdh::hadoop'] -> Class['cdh::hadoop::resourcemanager'] - # Create YARN HDFS directories. - # See: http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-Installation-Guide/cdh5ig_yarn_cluster_deploy.html?scroll=topic_11_4_10_unique_1 - cdh::hadoop::directory { '/var/log/hadoop-yarn': - # sudo -u hdfs hdfs dfs -mkdir /var/log/hadoop-yarn - # sudo -u hdfs hdfs dfs -chown yarn:mapred /var/log/hadoop-yarn - owner => 'yarn', - group => 'mapred', - mode => '0755', - # Make sure HDFS directories are created before - # resourcemanager is installed and started, but after - # the namenode. - require => [Service['hadoop-hdfs-namenode'], Cdh::Hadoop::Directory['/var/log']], + # In an HA YARN ResourceManager setup, this class will be included on multiple nodes. + # In order to have this directory check performed by only one resourcemanager, + # we only use it on the first node in the $resourcemanager_hosts array. + # This means that the Hadoop Master NameNode must be the same node as the + # Hadoop Master ResouceManager. + if !$::cdh::hadoop::yarn_ha_enabled or $::fqdn == $::cdh::hadoop::primary_resourcemanager_host { + # Create YARN HDFS directories. + # See: http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-Installation-Guide/cdh5ig_yarn_cluster_deploy.html?scroll=topic_11_4_10_unique_1 + cdh::hadoop::directory { '/var/log/hadoop-yarn': + # sudo -u hdfs hdfs dfs -mkdir /var/log/hadoop-yarn + # sudo -u hdfs hdfs dfs -chown yarn:mapred /var/log/hadoop-yarn + owner => 'yarn', + group => 'mapred', + mode => '0755', + # Make sure HDFS directories are created before + # resourcemanager is installed and started, but after + # the namenode. + require => [Service['hadoop-hdfs-namenode'], Cdh::Hadoop::Directory['/var/log']], + before => Package['hadoop-yarn-resourcemanager'], + } } package { 'hadoop-yarn-resourcemanager': ensure => 'installed', - require => Cdh::Hadoop::Directory['/var/log/hadoop-yarn'], } service { 'hadoop-yarn-resourcemanager': diff --git a/templates/hadoop/yarn-site.xml.erb b/templates/hadoop/yarn-site.xml.erb index 7548889..a94f089 100644 --- a/templates/hadoop/yarn-site.xml.erb +++ b/templates/hadoop/yarn-site.xml.erb @@ -1,3 +1,13 @@ +<% +# Convert a hostname to a Node ID. +# We can't use '.' characters because IDs. +# will be used in the names of some Java properties, +# which are '.' delimited. +def host_to_id(host) + host.tr('.', '-') +end + +-%> <?xml version="1.0"?> <!-- NOTE: This file is managed by Puppet. --> @@ -7,10 +17,88 @@ <configuration> +<% if @yarn_ha_enabled -%> + <property> + <name>yarn.resourcemanager.cluster-id</name> + <value><%= @yarn_cluster_id %></value> + </property> + + <property> + <name>yarn.resourcemanager.ha.rm-ids</name> + <value><%= @resourcemanager_hosts.sort.collect { |host| host_to_id(host) }.join(',') %></value> + </property> + +<% if @resourcemanager_hosts.include?(@fqdn) -%> + <property> + <name>yarn.resourcemanager.ha.id</name> + <value><%= host_to_id(@fqdn) %></value> + </property> +<% end -%> + + <property> + <name>yarn.resourcemanager.connect.retry-interval.ms</name> + <value>2000</value> + </property> + + <property> + <name>yarn.resourcemanager.ha.enabled</name> + <value>true</value> + </property> + + <property> + <name>yarn.resourcemanager.ha.automatic-failover.enabled</name> + <value>true</value> + </property> + + <property> + <name>yarn.resourcemanager.ha.automatic-failover.embedded</name> + <value>true</value> + </property> + + <property> + <name>yarn.resourcemanager.recovery.enabled</name> + <value>true</value> + </property> + + <property> + <name>yarn.resourcemanager.store.class</name> + <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> + </property> + + <property> + <name>yarn.resourcemanager.zk-address</name> + <value><%= Array(@zookeeper_hosts).sort.join(',') %></value> + </property> + + <property> + <name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name> + <value>5000</value> + </property> + + <property> + <name>yarn.resourcemanager.work-preserving-recovery.enabled</name> + <value>true</value> + </property> + + <property> + <name>yarn.resourcemanager.am.max-attempts</name> + <value>6</value> + </property> + +<% @resourcemanager_hosts.sort.each do |host| -%> + <property> + <name>yarn.resourcemanager.hostname.<%= host_to_id(host) %></name> + <value><%= host %></value> + </property> +<% end # @resourcemanager_hosts.each -%> + +<% else -%> <property> <name>yarn.resourcemanager.hostname</name> - <value><%= @primary_namenode_host %></value> + <value><%= @primary_resourcemanager_host %></value> </property> +<% end # if @yarn_ha_enabled -%> + <% if @fair_scheduler_enabled -%> <property> -- To view, visit https://gerrit.wikimedia.org/r/209019 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: merged Gerrit-Change-Id: I614968e8892392bfa1f0cf6e579a1f79d931682a Gerrit-PatchSet: 8 Gerrit-Project: operations/puppet/cdh Gerrit-Branch: master Gerrit-Owner: Ottomata <o...@wikimedia.org> Gerrit-Reviewer: Ottomata <o...@wikimedia.org> Gerrit-Reviewer: jenkins-bot <> _______________________________________________ MediaWiki-commits mailing list MediaWiki-commits@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits