Ottomata has uploaded a new change for review.

  https://gerrit.wikimedia.org/r/71569


Change subject: Puppetizing hive client, server and metastore.
......................................................................

Puppetizing hive client, server and metastore.

This change is not yet ready for review!

Change-Id: Ie7a024d371526d59c1f124230e3cce8342b1791c
---
M README.md
M manifests/hive.pp
A manifests/hive/defaults.pp
A manifests/hive/master.pp
A manifests/hive/metastore.pp
A manifests/hive/metastore/mysql.pp
A manifests/hive/server.pp
M manifests/sqoop.pp
A templates/hive/hive-site.xml.erb
M tests/Makefile
M tests/hive.pp
A tests/hive_master.pp
A tests/hive_metastore.pp
A tests/hive_metastore_mysql.pp
A tests/hive_server.pp
15 files changed, 607 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/operations/puppet/cdh4 
refs/changes/69/71569/1

diff --git a/README.md b/README.md
index 549c141..1849441 100644
--- a/README.md
+++ b/README.md
@@ -4,15 +4,19 @@
 Cloudera's Distribution 4 (CDH4) for Apache Hadoop.
 
 # Description
+
 Installs HDFS, YARN or MR1, Hive, HBase, Pig, Sqoop, Zookeeper, Oozie and
 Hue.  Note that, in order for this module to work, you will have to ensure
 that:
 
-* Sun JRE version 6 or greater is installed
-* Your package manager is configured with a repository containing the
+- Sun JRE version 6 or greater is installed
+- Your package manager is configured with a repository containing the
   Cloudera 4 packages.
 
+This module has been tested using CDH 4.2.1 on Ubuntu Precise 12.04.2 LTS
+
 # Installation:
+
 Clone (or copy) this repository into your puppet modules/cdh4 directory:
 ```bash
 git clone git://github.com/wikimedia/cloudera-cdh4-puppet.git modules/cdh4
@@ -28,6 +32,7 @@
 # Usage
 
 ## For all Hadoop nodes:
+
 ```puppet
 
 include cdh4
@@ -51,14 +56,45 @@
 If you would like to use MRv1 instead of YARN, set ```use_yarn``` to false.
 
 ## For your Hadoop master node:
+
 ```puppet
 include cdh4::hadoop::master
 ```
 This installs and starts up the NameNode.  If using YARN this will install and 
set up ResourceManager and HistoryServer.  If using MRv1, this will install and 
set up the JobTracker.
 
 ### For your Hadoop worker nodes:
+
 ```puppet
 include cdh4::hadoop::worker
 ```
 
 This installs and starts up the DataNode.  If using YARN, this will install 
and set up the NodeManager.  If using MRv1, this will install and set up the 
TaskTracker.
+
+## For all Hive enabled nodes:
+
+```puppet
+class { 'cdh4::hive':
+  metastore_host  => 'hive-metastore-node.domain.org',
+  zookeeper_hosts => ['zk1.domain.org', 'zk2.domain.org'],
+  jdbc_password   => $secret_password,
+}
+```
+
+## For your Hive master node (hive-server2 and hive-metastore):
+
+Include the same ```cdh4::hive``` class as indicated above, and then:
+
+```puppet
+class { 'cdh4::hive::master': }
+```
+
+By default, a Hive metastore backend MySQL database will be used.  You must
+separately ensure that your $metastore_database (e.g. mysql) package is 
installed.
+If you want to disable automatic setup of your metastore backend
+database, set the ```metastore_database``` parameter to undef:
+
+```puppet
+class { 'cdh4::hive::master':
+  metastore_database => undef,
+}
+```
diff --git a/manifests/hive.pp b/manifests/hive.pp
index 71ca4bf..6156bdf 100644
--- a/manifests/hive.pp
+++ b/manifests/hive.pp
@@ -1,10 +1,86 @@
 # == Class cdh4::hive
 #
 # Installs Hive packages (needed for Hive Client).
-# Use cdh4::hive::server to install and set up a Hive server.
+# Use this in conjunction with cdh4::hive::master to install and set up a
+# Hive Server and Hive Metastore.
 #
-class cdh4::hive {
+# == Parameters
+# $metastore_host                - fqdn of the metastore host
+# $zookeeper_hosts               - Array of zookeeper hostname/IP(:port)s.
+#                                  Default: undef (zookeeper lock management
+#                                  will not be used).
+#
+# $jdbc_database                 - Metastore JDBC database name.
+#                                  Default: 'hive_metastore'
+# $jdbc_username                 - Metastore JDBC username.  Default: hive
+# $jdbc_password                 - Metastore JDBC password.  Default: hive
+# $jdbc_host                     - Metastore JDBC hostname.  Default: localhost
+# $jdbc_driver                   - Metastore JDBC driver class name.
+#                                  Default: 
org.apache.derby.jdbc.EmbeddedDriver
+# $jdbc_protocol                 - Metastore JDBC protocol.  Default: mysql
+#
+# $exec_parallel_thread_number   - Number of jobs at most can be executed in 
parallel.
+#                                  Set this to 0 to disable parallel execution.
+# $optimize_skewjoin             - Enable or disable skew join optimization.
+#                                  Default: false
+# $skewjoin_key                  - Number of rows where skew join is used.
+#                                - Default: 10000
+# $skewjoin_mapjoin_map_tasks    - Number of map tasks used in the follow up
+#                                  map join jobfor a skew join.   Default: 
10000.
+# $skewjoin_mapjoin_min_split    - Skew join minimum split size.  Default: 
33554432
+#
+# $stats_enabled                 - Enable or disable temp Hive stats.  
Default: false
+# $stats_dbclass                 - The default database class that stores
+#                                  temporary hive statistics.  Default: 
jdbc:derby
+# $stats_jdbcdriver              - JDBC driver for the database that stores
+#                                  temporary hive statistics.
+#                                  Default: 
org.apache.derby.jdbc.EmbeddedDriver
+# $stats_dbconnectionstring      - Connection string for the database that 
stores
+#                                  temporary hive statistics.
+#                                  Default: 
jdbc:derby:;databaseName=TempStatsStore;create=true
+#
+class cdh4::hive(
+    $metastore_host,
+    $zookeeper_hosts             = $cdh4::hive::defaults::zookeeper_hosts,
+
+    $jdbc_database               = $cdh4::hive::defaults::jdbc_database,
+    $jdbc_username               = $cdh4::hive::defaults::jdbc_username,
+    $jdbc_password               = $cdh4::hive::defaults::jdbc_password,
+    $jdbc_host                   = $cdh4::hive::defaults::jdbc_host,
+    $jdbc_driver                 = $cdh4::hive::defaults::jdbc_driver,
+    $jdbc_protocol               = $cdh4::hive::defaults::jdbc_protocol,
+
+    $exec_parallel_thread_number = 
$cdh4::hive::defaults::exec_parallel_thread_number,
+    $optimize_skewjoin           = $cdh4::hive::defaults::optimize_skewjoin,
+    $skewjoin_key                = $cdh4::hive::defaults::skewjoin_key,
+    $skewjoin_mapjoin_map_tasks  = 
$cdh4::hive::defaults::skewjoin_mapjoin_map_tasks,
+
+    $stats_enabled               = $cdh4::hive::defaults::stats_enabled,
+    $stats_dbclass               = $cdh4::hive::defaults::stats_dbclass,
+    $stats_jdbcdriver            = $cdh4::hive::defaults::stats_jdbcdriver,
+    $stats_dbconnectionstring    = 
$cdh4::hive::defaults::stats_dbconnectionstring,
+
+    $hive_site_template          = $cdh4::hive::defaults::hive_site_template,
+    $hive_exec_log4j_template    = 
$cdh4::hive::defaults::hive_exec_log4j_template
+) inherits cdh4::hive::defaults
+{
     package { 'hive':
         ensure => 'installed',
     }
+
+    # Make sure hive-site.xml is not world readable on the
+    # metastore host.  On the metastore host, hive-site.xml
+    # will contain database connection credentials.
+    $hive_site_mode = $metastore_host ? {
+        $::fqdn => '0640',
+        default => '0644',
+    }
+    file { '/etc/hive/conf/hive-site.xml':
+        content => template($hive_site_template),
+        mode    => $hive_site_mode,
+        require => Package['hive'],
+    }
+    file { '/etc/hive/conf/hive-exec-log4j.properties':
+        content => template($hive_exec_log4j_template),
+    }
 }
\ No newline at end of file
diff --git a/manifests/hive/defaults.pp b/manifests/hive/defaults.pp
new file mode 100644
index 0000000..52311f6
--- /dev/null
+++ b/manifests/hive/defaults.pp
@@ -0,0 +1,32 @@
+# == Class hive::defaults
+# Default Hive configs
+#
+class cdh4::hive::defaults {
+    $zookeeper_hosts             = undef
+
+    $jdbc_driver                 = 'org.apache.derby.jdbc.EmbeddedDriver'
+    $jdbc_protocol               = 'mysql'
+    $jdbc_database               = 'hive_metastore'
+    $jdbc_host                   = 'localhost'
+    $jdbc_port                   = 3306
+    $jdbc_username               = 'hive'
+    $jdbc_password               = 'hive'
+
+    $exec_parallel_thread_number = 8  # set this to 0 to disable 
hive.exec.parallel
+    $optimize_skewjoin           = false
+    $skewjoin_key                = 10000
+    $skewjoin_mapjoin_map_tasks  = 10000
+    $skewjoin_mapjoin_min_split  = 33554432
+
+    $stats_enabled               = false
+    $stats_dbclass               = 'jdbc:derby'
+    $stats_jdbcdriver            = 'org.apache.derby.jdbc.EmbeddedDriver'
+    $stats_dbconnectionstring    = 
'jdbc:derby:;databaseName=TempStatsStore;create=true'
+
+    # Default puppet paths to template config files.
+    # This allows us to use custom template config files
+    # if we want to override more settings than this
+    # module yet supports.
+    $hive_site_template          = 'cdh4/hive/hive-site.xml.erb'
+    $hive_exec_log4j_template    = 'cdh4/hive/hive-exec-log4j.properties.erb'
+}
\ No newline at end of file
diff --git a/manifests/hive/master.pp b/manifests/hive/master.pp
new file mode 100644
index 0000000..afb1041
--- /dev/null
+++ b/manifests/hive/master.pp
@@ -0,0 +1,31 @@
+# == Class cdh4::hive::master
+# Wrapper class for hive::server, hive::metastore, and hive::metastore::* 
databases.
+#
+# Include this class on your Hive master node with $metastore_database
+# set to one of the available metastore backend classes in the hive/metastore/
+# directory.  If you want to set up a hive metastore database backend that
+# is not supported here, you may set $metastore_databse to undef.
+#
+# You must separately ensure that your $metastore_database (e.g. mysql) package
+# is installed.
+#
+# == Parameters
+# $metastore_database - Name of metastore database to use.  This should be
+#                       the name of a cdh4::hive::metastore::* class in
+#                       hive/metastore/*.pp.
+#
+class cdh4::hive::master($metastore_database = 'mysql') {
+    class { 'cdh4::hive::server':    }
+    class { 'cdh4::hive::metastore': }
+
+    # Set up the metastore database by including
+    # the $metastore_database_class.
+    $metastore_database_class = "cdh4::hive::metastore::${metastore_database}"
+    if ($metastore_database) {
+        class { $metastore_database_class: }
+    }
+
+    # Make sure the $metastore_database_class is included and set up
+    # before we start the hive-metastore service
+    Class[$metastore_database_class] -> Class['cdh4::hive::metastore']
+}
\ No newline at end of file
diff --git a/manifests/hive/metastore.pp b/manifests/hive/metastore.pp
new file mode 100644
index 0000000..f806a70
--- /dev/null
+++ b/manifests/hive/metastore.pp
@@ -0,0 +1,17 @@
+# == Class cdh4::hive::metastore
+#
+class cdh4::hive::metastore
+{
+    Class['cdh4::hive'] -> Class['cdh4::hive::metastore']
+
+    package { 'hive-metastore':
+        ensure => 'installed',
+    }
+
+    service { 'hive-metastore':
+        ensure     => 'running',
+        require    => Package['hive-metastore'],
+        hasrestart => true,
+        hasstatus  => true,
+    }
+}
\ No newline at end of file
diff --git a/manifests/hive/metastore/mysql.pp 
b/manifests/hive/metastore/mysql.pp
new file mode 100644
index 0000000..2ce0e69
--- /dev/null
+++ b/manifests/hive/metastore/mysql.pp
@@ -0,0 +1,57 @@
+# == Class cdh4::hive::metastore::mysql
+# Configures and sets up a MySQL metastore for Hive.
+#
+# Note that this class does not support running
+# the Metastore database on a different host than where your
+# hive-metastore service will run.  Permissions will only be granted
+# for localhost MySQL users, so hive-metastore must run on this node.
+#
+# Also, root must be able to run /usr/bin/mysql with no password and have 
permissions
+# to create databases and users and grant permissions.
+#
+# See: 
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Installation-Guide/cdh4ig_hive_metastore_configure.html
+#
+# == Parameters
+# $schema_version - When installing the metastore database, this version of
+#                   the schema will be created.  This must match an .sql file
+#                   schema version found in 
/usr/lib/hive/scripts/metastore/upgrade/mysql.
+#                   Default: 0.10.0
+#
+class cdh4::hive::metastore::mysql($schema_version = '0.10.0') {
+    Class['cdh4::hive'] -> Class['cdh4::hive::metastore::mysql']
+
+    if (!defined(Package['libmysql-java'])) {
+        package { 'libmysql-java':
+            ensure => 'installed',
+        }
+    }
+    # symlink the mysql.jar into /var/lib/hive/lib
+    file { '/usr/lib/hive/lib/libmysql-java.jar':
+        ensure  => 'link',
+        target  => '/usr/share/java/mysql.jar',
+        require => Package['libmysql-java'],
+    }
+
+    $db_name = $cdh4::hive::jdbc_database
+    $db_user = $cdh4::hive::jdbc_username
+    $db_pass = $cdh4::hive::jdbc_password
+
+    # hive is going to need an hive database and user.
+    exec { 'hive_mysql_create_database':
+        command => "/usr/bin/mysql -e \"
+CREATE DATABASE ${db_name}; USE ${db_name};
+SOURCE 
/usr/lib/hive/scripts/metastore/upgrade/mysql/hive-schema-${schema_version}.mysql.sql;\"",
+        unless  => "/usr/bin/mysql -e 'SHOW DATABASES' | /bin/grep -q 
${db_name}",
+        user    => 'root',
+    }
+    exec { 'hive_mysql_create_user':
+        command => "/usr/bin/mysql -e \"
+CREATE USER '${db_user}'@'localhost' IDENTIFIED BY '${db_pass}';
+CREATE USER '${db_user}'@'127.0.0.1' IDENTIFIED BY '${db_pass}';
+GRANT ALL PRIVILEGES ON ${db_name}.* TO '${db_user}'@'localhost' WITH GRANT 
OPTION;
+GRANT ALL PRIVILEGES ON ${db_name}.* TO '${db_user}'@'127.0.0.1' WITH GRANT 
OPTION;
+FLUSH PRIVILEGES;\"",
+        unless  => "/usr/bin/mysql -e \"SHOW GRANTS FOR 
'${db_user}'@'127.0.0.1'\" | grep -q \"TO '${db_user}'\"",
+        user    => 'root',
+    }
+}
\ No newline at end of file
diff --git a/manifests/hive/server.pp b/manifests/hive/server.pp
new file mode 100644
index 0000000..e5bdc0a
--- /dev/null
+++ b/manifests/hive/server.pp
@@ -0,0 +1,43 @@
+# == Class cdh4::hive::server
+# Configures hive-server2.  Requires that cdh4::hadoop is included so that
+# hadoop-client is available to create hive HDFS directories.
+#
+# See: 
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_18_5.html
+#
+class cdh4::hive::server
+{
+    # cdh4::hive::server requires hadoop client and configs are installed.
+    Class['cdh4::hadoop'] -> Class['cdh4::hive::server']
+    Class['cdh4::hive']   -> Class['cdh4::hive::server']
+
+    package { 'hive-server2':
+        ensure => 'installed',
+        alias  => 'hive-server',
+    }
+
+    # sudo -u hdfs hadoop fs -mkdir /user/hive
+    # sudo -u hdfs hadoop fs -chmod 0775 /user/hive
+    # sudo -u hdfs hadoop fs -chown hive:hadoop /user/hive
+    cdh4::hadoop::directory { '/user/hive':
+        owner   => 'hive',
+        group   => 'hadoop',
+        mode    => '0775',
+        require => Package['hive'],
+    }
+    # sudo -u hdfs hadoop fs -mkdir /user/hive/warehouse
+    # sudo -u hdfs hadoop fs -chmod 1777 /user/hive/warehouse
+    # sudo -u hdfs hadoop fs -chown hive:hadoop /user/hive/warehouse
+    cdh4::hadoop::directory { '/user/hive/warehouse':
+        owner   => 'hive',
+        group   => 'hadoop',
+        mode    => '1777',
+        require => Cdh4::Hadoop::Directory['/user/hive'],
+    }
+
+    service { 'hive-server2':
+        ensure     => 'running',
+        require    => Package['hive-server2'],
+        hasrestart => true,
+        hasstatus  => true,
+    }
+}
\ No newline at end of file
diff --git a/manifests/sqoop.pp b/manifests/sqoop.pp
index c8e770d..eb60e8e 100644
--- a/manifests/sqoop.pp
+++ b/manifests/sqoop.pp
@@ -1,13 +1,18 @@
 # == Class cdh4::sqoop
 # Installs Sqoop
 class cdh4::sqoop {
-    package { ['sqoop', 'libmysql-java']:
+    package { 'sqoop':
         ensure => 'installed',
     }
 
+    if (!defined(Package['libmysql-java'])) {
+        package { 'libmysql-java':
+            ensure => 'installed',
+        }
+    }
     # symlink the mysql-connector-java.jar that is installed by
     # libmysql-java into /usr/lib/sqoop/lib
-
+    # TODO: Can I create this symlink as mysql.jar?
     file { '/usr/lib/sqoop/lib/mysql-connector-java.jar':
         ensure  => 'link',
         target  => '/usr/share/java/mysql-connector-java.jar',
diff --git a/templates/hive/hive-site.xml.erb b/templates/hive/hive-site.xml.erb
new file mode 100644
index 0000000..4f8205c
--- /dev/null
+++ b/templates/hive/hive-site.xml.erb
@@ -0,0 +1,256 @@
+<?xml version="1.0"?>
+<!-- NOTE:  This file is managed by Puppet. -->
+
+<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
+
+<configuration>
+
+  <!-- Hive Configuration can either be stored in this file or in the hadoop 
configuration files  -->
+  <!-- that are implied by Hadoop setup variables.                             
                   -->
+  <!-- Aside from Hadoop setup variables - this file is provided as a 
convenience so that Hive    -->
+  <!-- users do not have to edit hadoop configuration files (that may be 
managed as a centralized -->
+  <!-- resource).                                                              
                   -->
+
+  <!-- Hive metastore configuration -->
+  <property>
+    <name>hive.metastore.uris</name>
+    <value>thrift://<%= @metastore_host %>:9083</value>
+    <description>Fully-qualified domain name and port of the metastore 
host</description>
+  </property>
+<%
+# if this node is the metastore_host, then render out
+# metastore backend database connection credentials
+if (@metastore_host == @fqdn)
+-%>
+
+  <property>
+    <name>javax.jdo.option.ConnectionURL</name>
+    <value><%= 
"jdbc:#{@jdbc_protocol}://#{@jdbc_host}:#{@jdbc_port}/#{@jdbc_database}" 
%></value>
+    <description>JDBC connect string for a JDBC metastore</description>
+  </property>
+
+  <property>
+    <name>javax.jdo.option.ConnectionDriverName</name>
+    <value><%= @jdbc_driver %></value>
+    <description>Driver class name for a JDBC metastore</description>
+  </property>
+
+  <% if @jdbc_username -%>
+  <property>
+    <name>javax.jdo.option.ConnectionUserName</name>
+    <value><%= @jdbc_username %></value>
+  </property>
+  <% end -%>
+
+  <% if @jdbc_password and @jdbc_password.empty? == false -%>
+  <property>
+    <name>javax.jdo.option.ConnectionPassword</name>
+    <value><%= @jdbc_password %></value>
+  </property>
+  <% end -%>
+
+  <property>
+    <name>datanucleus.autoCreateSchema</name>
+    <value>false</value>
+  </property>
+
+  <property>
+    <name>datanucleus.fixedDatastore</name>
+    <value>true</value>
+  </property>
+
+<% end -%>
+<% if @zookeeper_hosts -%>
+  <!-- Hive can use Zookeeper for table lock management -->
+  <property>
+    <name>hive.support.concurrency</name>
+    <description>Enable Hive's Table Lock Manager Service</description>
+    <value>true</value>
+  </property>
+
+  <property>
+    <name>hive.zookeeper.quorum</name>
+    <description>Zookeeper quorum used by Hive's Table Lock 
Manager</description>
+    <value><%= @zookeeper_hosts.sort.join(',') %></value>
+  </property>
+<% end -%>
+
+
+  <!-- Hive Execution Parameters -->
+  <property>
+    <name>hive.cli.print.current.db</name>
+    <description>Whether to include the current database in the hive 
prompt.</description>
+    <value>true</value>
+  </property>
+
+  <property>
+    <name>hive.cli.print.header</name>
+    <description>Whether to print the names of the columns in query 
output.</description>
+    <value>true</value>
+  </property>
+
+
+  <property>
+    <name>hive.mapred.mode</name>
+    <description>
+      The mode in which the hive operations are being performed. 
+       In strict mode, some risky queries are not allowed to run. They include:
+       Cartesian Product.
+       No partition being picked up for a query.
+       Comparing bigints and strings.
+       Comparing bigints and doubles.
+       Orderby without limit.
+    </description>
+    <value>strict</value>
+  </property>
+
+  <property>
+    <name>hive.start.cleanup.scratchdir</name>
+    <description>To cleanup the hive scratchdir while starting the hive 
server.</description>
+    <value>true</value>
+  </property>
+
+  <property>
+    <name>hive.error.on.empty.partition</name>
+    <description>Whether to throw an exception if dynamic partition insert 
generates empty results.</description>
+    <value>true</value>
+  </property>
+
+  <property>
+    <name>hive.insert.into.external.tables</name>
+    <description>https://issues.apache.org/jira/browse/HIVE-2837</description>
+    <value>false</value>
+  </property>
+
+  <property>
+    <name>hive.exec.parallel</name>
+    <description>Whether to execute jobs in parallel</description>
+    <value><%= @exec_parallel_thread_number.to_i > 0 ? 'true' : 'false' 
%></value>
+  </property>
+  
+  <property>
+    <name>hive.exec.parallel.thread.number</name>
+    <description>How many jobs at most can be executed in 
parallel</description>
+    <value><% @exec_parallel_thread_number %></value>
+  </property>
+
+<% if @optimize_skewjoin -%>
+  <property>
+    <name>hive.optimize.skewjoin</name>
+    <value><%= @optimize_skewjoin %></value>
+    <description>
+      Whether to enable skew join optimization.
+      The algorithm is as follows: At runtime, detect the keys with a large 
skew. Instead of
+      processing those keys, store them temporarily in a hdfs directory. In a 
follow-up map-reduce
+      job, process those skewed keys. The same key need not be skewed for all 
the tables, and so,
+      the follow-up map-reduce job (for the skewed keys) would be much faster, 
since it would be a 
+      map-join.
+    </description>
+  </property>
+
+  <property>
+    <name>hive.skewjoin.key</name>
+    <value><%= @skewjoin_key %></value>
+    <description>
+      Determine if we get a skew key in join. If we see more
+      than the specified number of rows with the same key in join operator,
+      we think the key as a skew join key.
+    </description>
+  </property>
+
+  <property>
+    <name>hive.skewjoin.mapjoin.map.tasks</name>
+    <value><%= @skewjoin_mapjoin_map_tasks %></value>
+    <description>
+      Determine the number of map task used in the follow up map join job
+      for a skew join. It should be used together with 
hive.skewjoin.mapjoin.min.split
+      to perform a fine grained control.
+    </description> 
+  </property>
+
+  <property>
+    <name>hive.skewjoin.mapjoin.min.split</name>
+    <value><%= @skewjoin_mapjoin_min_split %></value>
+    <description>
+      Determine the number of map task at most used in the follow up map join 
job
+      for a skew join by specifying the minimum split size. It should be used 
together with
+      hive.skewjoin.mapjoin.map.tasks to perform a fine grained control.
+    </description>
+  </property>
+<% end -%>
+
+<% if @stats_enabled -%>
+  <!-- Hive stats configuration -->
+  <property>
+    <name>hive.stats.dbclass</name>
+    <value><%= @stats_dbclass %></value>
+    <description>The default database that stores temporary hive 
statistics.</description>
+  </property>
+
+  <property>
+    <name>hive.stats.jdbcdriver</name>
+    <value><%= @stats_jdbcdriver %></value>
+    <description>The JDBC driver for the database that stores temporary hive 
statistics.</description>
+  </property>
+
+  <property>
+    <name>hive.stats.dbconnectionstring</name>
+    <value><%= @stats_dbconnectionstring %></value>
+    <description>The default connection string for the database that stores 
temporary hive statistics.</description>
+  </property>
+
+  <property>
+    <name>hive.stats.autogather</name>
+    <value>true</value>
+    <description>A flag to gather statistics automatically during the INSERT 
OVERWRITE command.</description>
+  </property>
+
+  <property>
+    <name>hive.stats.default.publisher</name>
+    <value></value>
+    <description>The Java class (implementing the StatsPublisher interface) 
that is used by default if hive.stats.dbclass is not JDBC or 
HBase.</description>
+  </property>
+
+  <property>
+    <name>hive.stats.default.aggregator</name>
+    <value></value>
+    <description>The Java class (implementing the StatsAggregator interface) 
that is used by default if hive.stats.dbclass is not JDBC or 
HBase.</description>
+  </property>
+
+  <property>
+    <name>hive.stats.jdbc.timeout</name>
+    <value>30</value>
+    <description>Timeout value (number of seconds) used by JDBC connection and 
statements.</description>
+  </property>
+
+  <property>
+    <name>hive.stats.retries.max</name>
+    <value>0</value>
+    <description>Maximum number of retries when stats publisher/aggregator got 
an exception updating intermediate database. Default is no tries on 
failures.</description>
+  </property>
+
+  <property>
+    <name>hive.stats.retries.wait</name>
+    <value>3000</value>
+    <description>The base waiting window (in milliseconds) before the next 
retry. The actual wait time is calculated by baseWindow * failues + baseWindow 
* (failure + 1) * (random number between [0.0,1.0]).</description>
+  </property>
+
+  <property>
+    <name>hive.stats.reliable</name>
+    <value>false</value>
+    <description>Whether queries will fail because stats cannot be collected 
completely accurately. 
+      If this is set to true, reading/writing from/into a partition may fail 
becuase the stats 
+      could not be computed accurately.
+    </description>
+  </property>
+
+  <property>
+    <name>hive.stats.collect.tablekeys</name>
+    <value>true</value>
+    <description>Whether join and group by keys on tables are derived and 
maintained in the QueryPlan.
+      This is useful to identify how tables are accessed and to determine if 
they should be bucketed.
+    </description>
+  </property>
+
+<% end -%>
+</configuration>
\ No newline at end of file
diff --git a/tests/Makefile b/tests/Makefile
index b1acb3b..458f5a6 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -1,4 +1,4 @@
-MANIFESTS=datanode.po defaults.po hadoop.po historyserver.po  hive.po 
jobtracker.po Makefile master.po namenode.po nodemanager.po pig.po 
resourcemanager.po sqoop.po tasktracker.po worker.po
+MANIFESTS=datanode.po defaults.po hadoop.po historyserver.po hive.po 
hive_master.po hive_metastore.po hive_metastore_mysql.po hive_server.po 
jobtracker.po Makefile master.po namenode.po nodemanager.po pig.po 
resourcemanager.po sqoop.po tasktracker.po worker.po
 TESTS_DIR=$(dir $(CURDIR))
 MODULE_DIR=$(TESTS_DIR:/=)
 MODULES_DIR=$(dir $(MODULE_DIR))
diff --git a/tests/hive.pp b/tests/hive.pp
index 05d63da..8986e8e 100644
--- a/tests/hive.pp
+++ b/tests/hive.pp
@@ -1,2 +1,6 @@
-
-include cdh4::hive
\ No newline at end of file
+$fqdn = 'hive1.domain.org'
+class { 'cdh4::hive':
+  metastore_host  => $fqdn,
+  zookeeper_hosts => ['zk1.domain.org', 'zk2.domain.org'],
+  jdbc_password   => 'test',
+}
\ No newline at end of file
diff --git a/tests/hive_master.pp b/tests/hive_master.pp
new file mode 100644
index 0000000..1198826
--- /dev/null
+++ b/tests/hive_master.pp
@@ -0,0 +1,12 @@
+$fqdn = 'hive1.domain.org'
+class { '::cdh4::hadoop':
+  namenode_hostname    => 'localhost',
+  dfs_name_dir         => '/var/lib/hadoop/name',
+}
+
+class { 'cdh4::hive':
+  metastore_host  => $fqdn,
+  zookeeper_hosts => ['zk1.domain.org', 'zk2.domain.org'],
+  jdbc_password   => 'test',
+}
+class { 'cdh4::hive::master': }
diff --git a/tests/hive_metastore.pp b/tests/hive_metastore.pp
new file mode 100644
index 0000000..ae6fb71
--- /dev/null
+++ b/tests/hive_metastore.pp
@@ -0,0 +1,8 @@
+$fqdn = 'hive1.domain.org'
+class { 'cdh4::hive':
+  metastore_host  => $fqdn,
+  zookeeper_hosts => ['zk1.domain.org', 'zk2.domain.org'],
+  jdbc_password   => 'test',
+}
+class { 'cdh4::hive::metastore': }
+
diff --git a/tests/hive_metastore_mysql.pp b/tests/hive_metastore_mysql.pp
new file mode 100644
index 0000000..cebff0c
--- /dev/null
+++ b/tests/hive_metastore_mysql.pp
@@ -0,0 +1,8 @@
+$fqdn = 'hive1.domain.org'
+class { 'cdh4::hive':
+  metastore_host  => $fqdn,
+  zookeeper_hosts => ['zk1.domain.org', 'zk2.domain.org'],
+  jdbc_password   => 'test',
+}
+class { 'cdh4::hive::metastore::mysql': }
+
diff --git a/tests/hive_server.pp b/tests/hive_server.pp
new file mode 100644
index 0000000..5697f83
--- /dev/null
+++ b/tests/hive_server.pp
@@ -0,0 +1,13 @@
+$fqdn = 'hive1.domain.org'
+class { '::cdh4::hadoop':
+  namenode_hostname    => 'localhost',
+  dfs_name_dir         => '/var/lib/hadoop/name',
+}
+
+class { 'cdh4::hive':
+  metastore_host  => $fqdn,
+  zookeeper_hosts => ['zk1.domain.org', 'zk2.domain.org'],
+  jdbc_password   => 'test',
+}
+class { 'cdh4::hive::server': }
+

-- 
To view, visit https://gerrit.wikimedia.org/r/71569
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie7a024d371526d59c1f124230e3cce8342b1791c
Gerrit-PatchSet: 1
Gerrit-Project: operations/puppet/cdh4
Gerrit-Branch: master
Gerrit-Owner: Ottomata <[email protected]>

_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to