Ottomata has submitted this change and it was merged.
Change subject: Puppetizing hive client, server and metastore.
......................................................................
Puppetizing hive client, server and metastore.
Change-Id: Ie7a024d371526d59c1f124230e3cce8342b1791c
---
M README.md
M TODO.md
M manifests/hive.pp
A manifests/hive/defaults.pp
A manifests/hive/master.pp
A manifests/hive/metastore.pp
A manifests/hive/metastore/mysql.pp
A manifests/hive/server.pp
M manifests/sqoop.pp
A templates/hive/hive-exec-log4j.properties.erb
A templates/hive/hive-site.xml.erb
M tests/Makefile
M tests/hive.pp
A tests/hive_master.pp
A tests/hive_metastore.pp
A tests/hive_metastore_mysql.pp
A tests/hive_server.pp
17 files changed, 653 insertions(+), 30 deletions(-)
Approvals:
Ottomata: Verified; Looks good to me, approved
jenkins-bot: Verified
diff --git a/README.md b/README.md
index 72b0365..ad6e231 100644
--- a/README.md
+++ b/README.md
@@ -4,18 +4,24 @@
Cloudera's Distribution 4 (CDH4) for Apache Hadoop.
# Description
-Installs HDFS, YARN or MR1, Hive, Pig, Sqoop, Zookeeper, Oozie and
+
+Installs HDFS, YARN or MR1, Hive, HBase, Pig, Sqoop, Zookeeper, Oozie and
Hue. Note that, in order for this module to work, you will have to ensure
that:
-* Sun JRE version 6 is installed.
-* Your package manager is configured with a repository containing the
+- Sun JRE version 6 or greater is installed
+- Your package manager is configured with a repository containing the
Cloudera 4 packages.
-Note that many of the above mentioned services are not yet implemented in v0.2.
-See the v0.1 branch if you'd like to use these now.
+Notes:
+
+- This module has only been tested using CDH 4.2.1 on Ubuntu Precise 12.04.2
LTS
+- Many of the above mentioned services are not yet implemented in v0.2.
+ See the v0.1 branch if you'd like to use these now.
+
# Installation:
+
Clone (or copy) this repository into your puppet modules/cdh4 directory:
```bash
git clone git://github.com/wikimedia/puppet-cdh4.git modules/cdh4
@@ -31,6 +37,7 @@
# Usage
## For all Hadoop nodes:
+
```puppet
include cdh4
@@ -54,6 +61,7 @@
If you would like to use MRv1 instead of YARN, set ```use_yarn``` to false.
## For your Hadoop master node:
+
```puppet
include cdh4::hadoop::master
```
@@ -62,25 +70,38 @@
and set up the JobTracker.
### For your Hadoop worker nodes:
+
```puppet
include cdh4::hadoop::worker
```
-This installs and starts up the DataNode. If using YARN, this will install
-and set up the NodeManager. If using MRv1, this will install and set up the
-TaskTracker.
+This installs and starts up the DataNode. If using YARN, this will install
and set up the NodeManager. If using MRv1, this will install and set up the
TaskTracker.
-# Notes:
+## For all Hive enabled nodes:
-## Coming Soon:
+```puppet
+class { 'cdh4::hive':
+ metastore_host => 'hive-metastore-node.domain.org',
+ zookeeper_hosts => ['zk1.domain.org', 'zk2.domain.org'],
+ jdbc_password => $secret_password,
+}
+```
-- Hive
-- Oozie
-- Hue
+## For your Hive master node (hive-server2 and hive-metastore):
-## History:
+Include the same ```cdh4::hive``` class as indicated above, and then:
-The original version of this module has been moved to the v0.1 branch.
-It is currently more feature full than v0.2 under development in master.
-v0.1 is no longer supported, and soon master will contain the same features
-as v0.1.
+```puppet
+class { 'cdh4::hive::master': }
+```
+
+By default, a Hive metastore backend MySQL database will be used. You must
+separately ensure that your $metastore_database (e.g. mysql) package is
installed.
+If you want to disable automatic setup of your metastore backend
+database, set the ```metastore_database``` parameter to undef:
+
+```puppet
+class { 'cdh4::hive::master':
+ metastore_database => undef,
+}
+```
diff --git a/TODO.md b/TODO.md
index 04e2dd2..7f1f572 100644
--- a/TODO.md
+++ b/TODO.md
@@ -13,9 +13,6 @@
- Make JMX ports configurable.
- Make hadoop-metrics2.properties more configurable.
-## Hive
-- Hive Server + Hive Metastore
-
## Oozie
## Hue
diff --git a/manifests/hive.pp b/manifests/hive.pp
index 71ca4bf..a7df675 100644
--- a/manifests/hive.pp
+++ b/manifests/hive.pp
@@ -1,10 +1,86 @@
# == Class cdh4::hive
#
# Installs Hive packages (needed for Hive Client).
-# Use cdh4::hive::server to install and set up a Hive server.
+# Use this in conjunction with cdh4::hive::master to install and set up a
+# Hive Server and Hive Metastore.
#
-class cdh4::hive {
+# == Parameters
+# $metastore_host - fqdn of the metastore host
+# $zookeeper_hosts - Array of zookeeper hostname/IP(:port)s.
+# Default: undef (zookeeper lock management
+# will not be used).
+#
+# $jdbc_database - Metastore JDBC database name.
+# Default: 'hive_metastore'
+# $jdbc_username - Metastore JDBC username. Default: hive
+# $jdbc_password - Metastore JDBC password. Default: hive
+# $jdbc_host - Metastore JDBC hostname. Default: localhost
+# $jdbc_driver - Metastore JDBC driver class name.
+# Default:
org.apache.derby.jdbc.EmbeddedDriver
+# $jdbc_protocol - Metastore JDBC protocol. Default: mysql
+#
+# $exec_parallel_thread_number - Number of jobs at most can be executed in
parallel.
+# Set this to 0 to disable parallel execution.
+# $optimize_skewjoin - Enable or disable skew join optimization.
+# Default: false
+# $skewjoin_key - Number of rows where skew join is used.
+# - Default: 10000
+# $skewjoin_mapjoin_map_tasks - Number of map tasks used in the follow up
+# map join jobfor a skew join. Default:
10000.
+# $skewjoin_mapjoin_min_split - Skew join minimum split size. Default:
33554432
+#
+# $stats_enabled - Enable or disable temp Hive stats.
Default: false
+# $stats_dbclass - The default database class that stores
+# temporary hive statistics. Default:
jdbc:derby
+# $stats_jdbcdriver - JDBC driver for the database that stores
+# temporary hive statistics.
+# Default:
org.apache.derby.jdbc.EmbeddedDriver
+# $stats_dbconnectionstring - Connection string for the database that
stores
+# temporary hive statistics.
+# Default:
jdbc:derby:;databaseName=TempStatsStore;create=true
+#
+class cdh4::hive(
+ $metastore_host,
+ $zookeeper_hosts = $cdh4::hive::defaults::zookeeper_hosts,
+
+ $jdbc_database = $cdh4::hive::defaults::jdbc_database,
+ $jdbc_username = $cdh4::hive::defaults::jdbc_username,
+ $jdbc_password = $cdh4::hive::defaults::jdbc_password,
+ $jdbc_host = $cdh4::hive::defaults::jdbc_host,
+ $jdbc_driver = $cdh4::hive::defaults::jdbc_driver,
+ $jdbc_protocol = $cdh4::hive::defaults::jdbc_protocol,
+
+ $exec_parallel_thread_number =
$cdh4::hive::defaults::exec_parallel_thread_number,
+ $optimize_skewjoin = $cdh4::hive::defaults::optimize_skewjoin,
+ $skewjoin_key = $cdh4::hive::defaults::skewjoin_key,
+ $skewjoin_mapjoin_map_tasks =
$cdh4::hive::defaults::skewjoin_mapjoin_map_tasks,
+
+ $stats_enabled = $cdh4::hive::defaults::stats_enabled,
+ $stats_dbclass = $cdh4::hive::defaults::stats_dbclass,
+ $stats_jdbcdriver = $cdh4::hive::defaults::stats_jdbcdriver,
+ $stats_dbconnectionstring =
$cdh4::hive::defaults::stats_dbconnectionstring,
+
+ $hive_site_template = $cdh4::hive::defaults::hive_site_template,
+ $hive_exec_log4j_template =
$cdh4::hive::defaults::hive_exec_log4j_template
+) inherits cdh4::hive::defaults
+{
package { 'hive':
ensure => 'installed',
}
+
+ # Make sure hive-site.xml is not world readable on the
+ # metastore host. On the metastore host, hive-site.xml
+ # will contain database connection credentials.
+ $hive_site_mode = $metastore_host ? {
+ $::fqdn => '0440',
+ default => '0444',
+ }
+ file { '/etc/hive/conf/hive-site.xml':
+ content => template($hive_site_template),
+ mode => $hive_site_mode,
+ require => Package['hive'],
+ }
+ file { '/etc/hive/conf/hive-exec-log4j.properties':
+ content => template($hive_exec_log4j_template),
+ }
}
\ No newline at end of file
diff --git a/manifests/hive/defaults.pp b/manifests/hive/defaults.pp
new file mode 100644
index 0000000..52311f6
--- /dev/null
+++ b/manifests/hive/defaults.pp
@@ -0,0 +1,32 @@
+# == Class hive::defaults
+# Default Hive configs
+#
+class cdh4::hive::defaults {
+ $zookeeper_hosts = undef
+
+ $jdbc_driver = 'org.apache.derby.jdbc.EmbeddedDriver'
+ $jdbc_protocol = 'mysql'
+ $jdbc_database = 'hive_metastore'
+ $jdbc_host = 'localhost'
+ $jdbc_port = 3306
+ $jdbc_username = 'hive'
+ $jdbc_password = 'hive'
+
+ $exec_parallel_thread_number = 8 # set this to 0 to disable
hive.exec.parallel
+ $optimize_skewjoin = false
+ $skewjoin_key = 10000
+ $skewjoin_mapjoin_map_tasks = 10000
+ $skewjoin_mapjoin_min_split = 33554432
+
+ $stats_enabled = false
+ $stats_dbclass = 'jdbc:derby'
+ $stats_jdbcdriver = 'org.apache.derby.jdbc.EmbeddedDriver'
+ $stats_dbconnectionstring =
'jdbc:derby:;databaseName=TempStatsStore;create=true'
+
+ # Default puppet paths to template config files.
+ # This allows us to use custom template config files
+ # if we want to override more settings than this
+ # module yet supports.
+ $hive_site_template = 'cdh4/hive/hive-site.xml.erb'
+ $hive_exec_log4j_template = 'cdh4/hive/hive-exec-log4j.properties.erb'
+}
\ No newline at end of file
diff --git a/manifests/hive/master.pp b/manifests/hive/master.pp
new file mode 100644
index 0000000..afb1041
--- /dev/null
+++ b/manifests/hive/master.pp
@@ -0,0 +1,31 @@
+# == Class cdh4::hive::master
+# Wrapper class for hive::server, hive::metastore, and hive::metastore::*
databases.
+#
+# Include this class on your Hive master node with $metastore_database
+# set to one of the available metastore backend classes in the hive/metastore/
+# directory. If you want to set up a hive metastore database backend that
+# is not supported here, you may set $metastore_databse to undef.
+#
+# You must separately ensure that your $metastore_database (e.g. mysql) package
+# is installed.
+#
+# == Parameters
+# $metastore_database - Name of metastore database to use. This should be
+# the name of a cdh4::hive::metastore::* class in
+# hive/metastore/*.pp.
+#
+class cdh4::hive::master($metastore_database = 'mysql') {
+ class { 'cdh4::hive::server': }
+ class { 'cdh4::hive::metastore': }
+
+ # Set up the metastore database by including
+ # the $metastore_database_class.
+ $metastore_database_class = "cdh4::hive::metastore::${metastore_database}"
+ if ($metastore_database) {
+ class { $metastore_database_class: }
+ }
+
+ # Make sure the $metastore_database_class is included and set up
+ # before we start the hive-metastore service
+ Class[$metastore_database_class] -> Class['cdh4::hive::metastore']
+}
\ No newline at end of file
diff --git a/manifests/hive/metastore.pp b/manifests/hive/metastore.pp
new file mode 100644
index 0000000..f806a70
--- /dev/null
+++ b/manifests/hive/metastore.pp
@@ -0,0 +1,17 @@
+# == Class cdh4::hive::metastore
+#
+class cdh4::hive::metastore
+{
+ Class['cdh4::hive'] -> Class['cdh4::hive::metastore']
+
+ package { 'hive-metastore':
+ ensure => 'installed',
+ }
+
+ service { 'hive-metastore':
+ ensure => 'running',
+ require => Package['hive-metastore'],
+ hasrestart => true,
+ hasstatus => true,
+ }
+}
\ No newline at end of file
diff --git a/manifests/hive/metastore/mysql.pp
b/manifests/hive/metastore/mysql.pp
new file mode 100644
index 0000000..2ce0e69
--- /dev/null
+++ b/manifests/hive/metastore/mysql.pp
@@ -0,0 +1,57 @@
+# == Class cdh4::hive::metastore::mysql
+# Configures and sets up a MySQL metastore for Hive.
+#
+# Note that this class does not support running
+# the Metastore database on a different host than where your
+# hive-metastore service will run. Permissions will only be granted
+# for localhost MySQL users, so hive-metastore must run on this node.
+#
+# Also, root must be able to run /usr/bin/mysql with no password and have
permissions
+# to create databases and users and grant permissions.
+#
+# See:
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Installation-Guide/cdh4ig_hive_metastore_configure.html
+#
+# == Parameters
+# $schema_version - When installing the metastore database, this version of
+# the schema will be created. This must match an .sql file
+# schema version found in
/usr/lib/hive/scripts/metastore/upgrade/mysql.
+# Default: 0.10.0
+#
+class cdh4::hive::metastore::mysql($schema_version = '0.10.0') {
+ Class['cdh4::hive'] -> Class['cdh4::hive::metastore::mysql']
+
+ if (!defined(Package['libmysql-java'])) {
+ package { 'libmysql-java':
+ ensure => 'installed',
+ }
+ }
+ # symlink the mysql.jar into /var/lib/hive/lib
+ file { '/usr/lib/hive/lib/libmysql-java.jar':
+ ensure => 'link',
+ target => '/usr/share/java/mysql.jar',
+ require => Package['libmysql-java'],
+ }
+
+ $db_name = $cdh4::hive::jdbc_database
+ $db_user = $cdh4::hive::jdbc_username
+ $db_pass = $cdh4::hive::jdbc_password
+
+ # hive is going to need an hive database and user.
+ exec { 'hive_mysql_create_database':
+ command => "/usr/bin/mysql -e \"
+CREATE DATABASE ${db_name}; USE ${db_name};
+SOURCE
/usr/lib/hive/scripts/metastore/upgrade/mysql/hive-schema-${schema_version}.mysql.sql;\"",
+ unless => "/usr/bin/mysql -e 'SHOW DATABASES' | /bin/grep -q
${db_name}",
+ user => 'root',
+ }
+ exec { 'hive_mysql_create_user':
+ command => "/usr/bin/mysql -e \"
+CREATE USER '${db_user}'@'localhost' IDENTIFIED BY '${db_pass}';
+CREATE USER '${db_user}'@'127.0.0.1' IDENTIFIED BY '${db_pass}';
+GRANT ALL PRIVILEGES ON ${db_name}.* TO '${db_user}'@'localhost' WITH GRANT
OPTION;
+GRANT ALL PRIVILEGES ON ${db_name}.* TO '${db_user}'@'127.0.0.1' WITH GRANT
OPTION;
+FLUSH PRIVILEGES;\"",
+ unless => "/usr/bin/mysql -e \"SHOW GRANTS FOR
'${db_user}'@'127.0.0.1'\" | grep -q \"TO '${db_user}'\"",
+ user => 'root',
+ }
+}
\ No newline at end of file
diff --git a/manifests/hive/server.pp b/manifests/hive/server.pp
new file mode 100644
index 0000000..e5bdc0a
--- /dev/null
+++ b/manifests/hive/server.pp
@@ -0,0 +1,43 @@
+# == Class cdh4::hive::server
+# Configures hive-server2. Requires that cdh4::hadoop is included so that
+# hadoop-client is available to create hive HDFS directories.
+#
+# See:
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_18_5.html
+#
+class cdh4::hive::server
+{
+ # cdh4::hive::server requires hadoop client and configs are installed.
+ Class['cdh4::hadoop'] -> Class['cdh4::hive::server']
+ Class['cdh4::hive'] -> Class['cdh4::hive::server']
+
+ package { 'hive-server2':
+ ensure => 'installed',
+ alias => 'hive-server',
+ }
+
+ # sudo -u hdfs hadoop fs -mkdir /user/hive
+ # sudo -u hdfs hadoop fs -chmod 0775 /user/hive
+ # sudo -u hdfs hadoop fs -chown hive:hadoop /user/hive
+ cdh4::hadoop::directory { '/user/hive':
+ owner => 'hive',
+ group => 'hadoop',
+ mode => '0775',
+ require => Package['hive'],
+ }
+ # sudo -u hdfs hadoop fs -mkdir /user/hive/warehouse
+ # sudo -u hdfs hadoop fs -chmod 1777 /user/hive/warehouse
+ # sudo -u hdfs hadoop fs -chown hive:hadoop /user/hive/warehouse
+ cdh4::hadoop::directory { '/user/hive/warehouse':
+ owner => 'hive',
+ group => 'hadoop',
+ mode => '1777',
+ require => Cdh4::Hadoop::Directory['/user/hive'],
+ }
+
+ service { 'hive-server2':
+ ensure => 'running',
+ require => Package['hive-server2'],
+ hasrestart => true,
+ hasstatus => true,
+ }
+}
\ No newline at end of file
diff --git a/manifests/sqoop.pp b/manifests/sqoop.pp
index c8e770d..eb60e8e 100644
--- a/manifests/sqoop.pp
+++ b/manifests/sqoop.pp
@@ -1,13 +1,18 @@
# == Class cdh4::sqoop
# Installs Sqoop
class cdh4::sqoop {
- package { ['sqoop', 'libmysql-java']:
+ package { 'sqoop':
ensure => 'installed',
}
+ if (!defined(Package['libmysql-java'])) {
+ package { 'libmysql-java':
+ ensure => 'installed',
+ }
+ }
# symlink the mysql-connector-java.jar that is installed by
# libmysql-java into /usr/lib/sqoop/lib
-
+ # TODO: Can I create this symlink as mysql.jar?
file { '/usr/lib/sqoop/lib/mysql-connector-java.jar':
ensure => 'link',
target => '/usr/share/java/mysql-connector-java.jar',
diff --git a/templates/hive/hive-exec-log4j.properties.erb
b/templates/hive/hive-exec-log4j.properties.erb
new file mode 100644
index 0000000..591bc57
--- /dev/null
+++ b/templates/hive/hive-exec-log4j.properties.erb
@@ -0,0 +1,39 @@
+hive.log.threshold=INFO
+hive.root.logger=INFO,RFA
+hive.log.dir=/var/log/hive
+hive.log.file=${hive.query.id}.log
+
+# Define the root logger to the system property "hive.root.logger".
+log4j.rootLogger=${hive.root.logger}, EventCounter
+
+# Logging Threshold
+log4j.threshhold=${hive.log.threshold}
+
+#
+# Rolling File Appender - cap space usage at 512MB
+#
+hive.log.maxfilesize=256MB
+hive.log.maxbackupindex=2
+log4j.appender.RFA=org.apache.log4j.RollingFileAppender
+log4j.appender.RFA.File=${hive.log.dir}/${hive.log.file}
+log4j.appender.RFA.MaxFileSize=${hive.log.maxfilesize}
+log4j.appender.RFA.MaxBackupIndex=${hive.log.maxbackupindex}
+log4j.appender.RFA.layout=org.apache.log4j.PatternLayout
+# Pattern format: Date LogLevel LoggerName LogMessage
+log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
+
+#
+# Event Counter Appender
+# Sends counts of logging messages at different severity levels to Hadoop
Metrics.
+#
+log4j.appender.EventCounter=org.apache.hadoop.metrics.jvm.EventCounter
+
+log4j.category.DataNucleus=ERROR,RFA
+log4j.category.Datastore=ERROR,RFA
+log4j.category.Datastore.Schema=ERROR,RFA
+log4j.category.JPOX.Datastore=ERROR,RFA
+log4j.category.JPOX.Plugin=ERROR,RFA
+log4j.category.JPOX.MetaData=ERROR,RFA
+log4j.category.JPOX.Query=ERROR,RFA
+log4j.category.JPOX.General=ERROR,RFA
+log4j.category.JPOX.Enhancer=ERROR,RFA
diff --git a/templates/hive/hive-site.xml.erb b/templates/hive/hive-site.xml.erb
new file mode 100644
index 0000000..ac797e1
--- /dev/null
+++ b/templates/hive/hive-site.xml.erb
@@ -0,0 +1,259 @@
+<?xml version="1.0"?>
+<!-- NOTE: This file is managed by Puppet. -->
+
+<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
+
+<configuration>
+
+ <!-- Hive Configuration can either be stored in this file or in the hadoop
configuration files -->
+ <!-- that are implied by Hadoop setup variables.
-->
+ <!-- Aside from Hadoop setup variables - this file is provided as a
convenience so that Hive -->
+ <!-- users do not have to edit hadoop configuration files (that may be
managed as a centralized -->
+ <!-- resource).
-->
+
+ <!-- Hive metastore configuration -->
+ <property>
+ <name>hive.metastore.uris</name>
+ <value>thrift://<%= @metastore_host %>:9083</value>
+ <description>Fully-qualified domain name and port of the metastore
host</description>
+ </property>
+<%
+
+# if this node is the metastore_host, then render out
+# metastore backend database connection credentials
+# TODO: Possibly use hive-default.xml + hive-site.xml to only
+# render hive-site.xml with these credentials on metastore_host.
+if (@metastore_host == @fqdn)
+-%>
+
+ <property>
+ <name>javax.jdo.option.ConnectionURL</name>
+ <value><%=
"jdbc:#{@jdbc_protocol}://#{@jdbc_host}:#{@jdbc_port}/#{@jdbc_database}"
%></value>
+ <description>JDBC connect string for a JDBC metastore</description>
+ </property>
+
+ <property>
+ <name>javax.jdo.option.ConnectionDriverName</name>
+ <value><%= @jdbc_driver %></value>
+ <description>Driver class name for a JDBC metastore</description>
+ </property>
+
+ <% if @jdbc_username -%>
+ <property>
+ <name>javax.jdo.option.ConnectionUserName</name>
+ <value><%= @jdbc_username %></value>
+ </property>
+ <% end -%>
+
+ <% if @jdbc_password and @jdbc_password.empty? == false -%>
+ <property>
+ <name>javax.jdo.option.ConnectionPassword</name>
+ <value><%= @jdbc_password %></value>
+ </property>
+ <% end -%>
+
+ <property>
+ <name>datanucleus.autoCreateSchema</name>
+ <value>false</value>
+ </property>
+
+ <property>
+ <name>datanucleus.fixedDatastore</name>
+ <value>true</value>
+ </property>
+
+<% end -%>
+<% if @zookeeper_hosts -%>
+ <!-- Hive can use Zookeeper for table lock management -->
+ <property>
+ <name>hive.support.concurrency</name>
+ <description>Enable Hive's Table Lock Manager Service</description>
+ <value>true</value>
+ </property>
+
+ <property>
+ <name>hive.zookeeper.quorum</name>
+ <description>Zookeeper quorum used by Hive's Table Lock
Manager</description>
+ <value><%= @zookeeper_hosts.sort.join(',') %></value>
+ </property>
+<% end -%>
+
+
+ <!-- Hive Execution Parameters -->
+ <property>
+ <name>hive.cli.print.current.db</name>
+ <description>Whether to include the current database in the hive
prompt.</description>
+ <value>true</value>
+ </property>
+
+ <property>
+ <name>hive.cli.print.header</name>
+ <description>Whether to print the names of the columns in query
output.</description>
+ <value>true</value>
+ </property>
+
+
+ <property>
+ <name>hive.mapred.mode</name>
+ <description>
+ The mode in which the hive operations are being performed.
+ In strict mode, some risky queries are not allowed to run. They include:
+ Cartesian Product.
+ No partition being picked up for a query.
+ Comparing bigints and strings.
+ Comparing bigints and doubles.
+ Orderby without limit.
+ </description>
+ <value>strict</value>
+ </property>
+
+ <property>
+ <name>hive.start.cleanup.scratchdir</name>
+ <description>To cleanup the hive scratchdir while starting the hive
server.</description>
+ <value>true</value>
+ </property>
+
+ <property>
+ <name>hive.error.on.empty.partition</name>
+ <description>Whether to throw an exception if dynamic partition insert
generates empty results.</description>
+ <value>true</value>
+ </property>
+
+ <property>
+ <name>hive.insert.into.external.tables</name>
+ <description>https://issues.apache.org/jira/browse/HIVE-2837</description>
+ <value>false</value>
+ </property>
+
+ <property>
+ <name>hive.exec.parallel</name>
+ <description>Whether to execute jobs in parallel</description>
+ <value><%= @exec_parallel_thread_number.to_i > 0 ? 'true' : 'false'
%></value>
+ </property>
+
+ <property>
+ <name>hive.exec.parallel.thread.number</name>
+ <description>How many jobs at most can be executed in
parallel</description>
+ <value><% @exec_parallel_thread_number %></value>
+ </property>
+
+<% if @optimize_skewjoin -%>
+ <property>
+ <name>hive.optimize.skewjoin</name>
+ <value><%= @optimize_skewjoin %></value>
+ <description>
+ Whether to enable skew join optimization.
+ The algorithm is as follows: At runtime, detect the keys with a large
skew. Instead of
+ processing those keys, store them temporarily in a hdfs directory. In a
follow-up map-reduce
+ job, process those skewed keys. The same key need not be skewed for all
the tables, and so,
+ the follow-up map-reduce job (for the skewed keys) would be much faster,
since it would be a
+ map-join.
+ </description>
+ </property>
+
+ <property>
+ <name>hive.skewjoin.key</name>
+ <value><%= @skewjoin_key %></value>
+ <description>
+ Determine if we get a skew key in join. If we see more
+ than the specified number of rows with the same key in join operator,
+ we think the key as a skew join key.
+ </description>
+ </property>
+
+ <property>
+ <name>hive.skewjoin.mapjoin.map.tasks</name>
+ <value><%= @skewjoin_mapjoin_map_tasks %></value>
+ <description>
+ Determine the number of map task used in the follow up map join job
+ for a skew join. It should be used together with
hive.skewjoin.mapjoin.min.split
+ to perform a fine grained control.
+ </description>
+ </property>
+
+ <property>
+ <name>hive.skewjoin.mapjoin.min.split</name>
+ <value><%= @skewjoin_mapjoin_min_split %></value>
+ <description>
+ Determine the number of map task at most used in the follow up map join
job
+ for a skew join by specifying the minimum split size. It should be used
together with
+ hive.skewjoin.mapjoin.map.tasks to perform a fine grained control.
+ </description>
+ </property>
+<% end -%>
+
+<% if @stats_enabled -%>
+ <!-- Hive stats configuration -->
+ <property>
+ <name>hive.stats.dbclass</name>
+ <value><%= @stats_dbclass %></value>
+ <description>The default database that stores temporary hive
statistics.</description>
+ </property>
+
+ <property>
+ <name>hive.stats.jdbcdriver</name>
+ <value><%= @stats_jdbcdriver %></value>
+ <description>The JDBC driver for the database that stores temporary hive
statistics.</description>
+ </property>
+
+ <property>
+ <name>hive.stats.dbconnectionstring</name>
+ <value><%= @stats_dbconnectionstring %></value>
+ <description>The default connection string for the database that stores
temporary hive statistics.</description>
+ </property>
+
+ <property>
+ <name>hive.stats.autogather</name>
+ <value>true</value>
+ <description>A flag to gather statistics automatically during the INSERT
OVERWRITE command.</description>
+ </property>
+
+ <property>
+ <name>hive.stats.default.publisher</name>
+ <value></value>
+ <description>The Java class (implementing the StatsPublisher interface)
that is used by default if hive.stats.dbclass is not JDBC or
HBase.</description>
+ </property>
+
+ <property>
+ <name>hive.stats.default.aggregator</name>
+ <value></value>
+ <description>The Java class (implementing the StatsAggregator interface)
that is used by default if hive.stats.dbclass is not JDBC or
HBase.</description>
+ </property>
+
+ <property>
+ <name>hive.stats.jdbc.timeout</name>
+ <value>30</value>
+ <description>Timeout value (number of seconds) used by JDBC connection and
statements.</description>
+ </property>
+
+ <property>
+ <name>hive.stats.retries.max</name>
+ <value>0</value>
+ <description>Maximum number of retries when stats publisher/aggregator got
an exception updating intermediate database. Default is no tries on
failures.</description>
+ </property>
+
+ <property>
+ <name>hive.stats.retries.wait</name>
+ <value>3000</value>
+ <description>The base waiting window (in milliseconds) before the next
retry. The actual wait time is calculated by baseWindow * failues + baseWindow
* (failure + 1) * (random number between [0.0,1.0]).</description>
+ </property>
+
+ <property>
+ <name>hive.stats.reliable</name>
+ <value>false</value>
+ <description>Whether queries will fail because stats cannot be collected
completely accurately.
+ If this is set to true, reading/writing from/into a partition may fail
becuase the stats
+ could not be computed accurately.
+ </description>
+ </property>
+
+ <property>
+ <name>hive.stats.collect.tablekeys</name>
+ <value>true</value>
+ <description>Whether join and group by keys on tables are derived and
maintained in the QueryPlan.
+ This is useful to identify how tables are accessed and to determine if
they should be bucketed.
+ </description>
+ </property>
+
+<% end -%>
+</configuration>
\ No newline at end of file
diff --git a/tests/Makefile b/tests/Makefile
index b1acb3b..1b82aea 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -1,11 +1,12 @@
-MANIFESTS=datanode.po defaults.po hadoop.po historyserver.po hive.po
jobtracker.po Makefile master.po namenode.po nodemanager.po pig.po
resourcemanager.po sqoop.po tasktracker.po worker.po
+MANIFESTS=$(wildcard *.pp)
+OBJS=$(MANIFESTS:.pp=.po)
TESTS_DIR=$(dir $(CURDIR))
MODULE_DIR=$(TESTS_DIR:/=)
MODULES_DIR=$(dir $(MODULE_DIR))
all: test
-test: $(MANIFESTS)
+test: $(OBJS)
%.po: %.pp
- puppet apply --noop --modulepath $(MODULES_DIR) $<
+ puppet apply --noop --modulepath $(MODULES_DIR) $<
\ No newline at end of file
diff --git a/tests/hive.pp b/tests/hive.pp
index 05d63da..8986e8e 100644
--- a/tests/hive.pp
+++ b/tests/hive.pp
@@ -1,2 +1,6 @@
-
-include cdh4::hive
\ No newline at end of file
+$fqdn = 'hive1.domain.org'
+class { 'cdh4::hive':
+ metastore_host => $fqdn,
+ zookeeper_hosts => ['zk1.domain.org', 'zk2.domain.org'],
+ jdbc_password => 'test',
+}
\ No newline at end of file
diff --git a/tests/hive_master.pp b/tests/hive_master.pp
new file mode 100644
index 0000000..1198826
--- /dev/null
+++ b/tests/hive_master.pp
@@ -0,0 +1,12 @@
+$fqdn = 'hive1.domain.org'
+class { '::cdh4::hadoop':
+ namenode_hostname => 'localhost',
+ dfs_name_dir => '/var/lib/hadoop/name',
+}
+
+class { 'cdh4::hive':
+ metastore_host => $fqdn,
+ zookeeper_hosts => ['zk1.domain.org', 'zk2.domain.org'],
+ jdbc_password => 'test',
+}
+class { 'cdh4::hive::master': }
diff --git a/tests/hive_metastore.pp b/tests/hive_metastore.pp
new file mode 100644
index 0000000..ae6fb71
--- /dev/null
+++ b/tests/hive_metastore.pp
@@ -0,0 +1,8 @@
+$fqdn = 'hive1.domain.org'
+class { 'cdh4::hive':
+ metastore_host => $fqdn,
+ zookeeper_hosts => ['zk1.domain.org', 'zk2.domain.org'],
+ jdbc_password => 'test',
+}
+class { 'cdh4::hive::metastore': }
+
diff --git a/tests/hive_metastore_mysql.pp b/tests/hive_metastore_mysql.pp
new file mode 100644
index 0000000..cebff0c
--- /dev/null
+++ b/tests/hive_metastore_mysql.pp
@@ -0,0 +1,8 @@
+$fqdn = 'hive1.domain.org'
+class { 'cdh4::hive':
+ metastore_host => $fqdn,
+ zookeeper_hosts => ['zk1.domain.org', 'zk2.domain.org'],
+ jdbc_password => 'test',
+}
+class { 'cdh4::hive::metastore::mysql': }
+
diff --git a/tests/hive_server.pp b/tests/hive_server.pp
new file mode 100644
index 0000000..5697f83
--- /dev/null
+++ b/tests/hive_server.pp
@@ -0,0 +1,13 @@
+$fqdn = 'hive1.domain.org'
+class { '::cdh4::hadoop':
+ namenode_hostname => 'localhost',
+ dfs_name_dir => '/var/lib/hadoop/name',
+}
+
+class { 'cdh4::hive':
+ metastore_host => $fqdn,
+ zookeeper_hosts => ['zk1.domain.org', 'zk2.domain.org'],
+ jdbc_password => 'test',
+}
+class { 'cdh4::hive::server': }
+
--
To view, visit https://gerrit.wikimedia.org/r/71569
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: Ie7a024d371526d59c1f124230e3cce8342b1791c
Gerrit-PatchSet: 4
Gerrit-Project: operations/puppet/cdh4
Gerrit-Branch: master
Gerrit-Owner: Ottomata <[email protected]>
Gerrit-Reviewer: Akosiaris <[email protected]>
Gerrit-Reviewer: Faidon <[email protected]>
Gerrit-Reviewer: Ottomata <[email protected]>
Gerrit-Reviewer: jenkins-bot
_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits