Repository: incubator-hivemall Updated Branches: refs/heads/master 0c4798eba -> bffd2c78d
Close #68: [HIVEMALL-84] Add Docker Support Project: http://git-wip-us.apache.org/repos/asf/incubator-hivemall/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-hivemall/commit/bffd2c78 Tree: http://git-wip-us.apache.org/repos/asf/incubator-hivemall/tree/bffd2c78 Diff: http://git-wip-us.apache.org/repos/asf/incubator-hivemall/diff/bffd2c78 Branch: refs/heads/master Commit: bffd2c78d6a938b3ac9b6b5d5f5c4d40a7cfffbf Parents: 0c4798e Author: Ryuichi Ito <m...@sapphire.in.net> Authored: Tue Apr 25 21:32:04 2017 +0900 Committer: myui <yuin...@gmail.com> Committed: Tue Apr 25 21:32:57 2017 +0900 ---------------------------------------------------------------------- .dockerignore | 6 ++ docs/gitbook/SUMMARY.md | 6 +- docs/gitbook/docker/getting_started.md | 68 +++++++++++++++++++++++ docs/gitbook/getting_started/installation.md | 11 ++++ resources/docker/Dockerfile | 63 +++++++++++++++++++++ resources/docker/docker-compose.yml | 19 +++++++ resources/docker/etc/hadoop/core-site.xml | 8 +++ resources/docker/etc/hadoop/hdfs-site.xml | 8 +++ resources/docker/etc/hadoop/mapred-site.xml | 7 +++ resources/docker/etc/hadoop/yarn-site.xml | 7 +++ resources/docker/home/.hiverc | 2 + resources/docker/home/bin/init.sh | 7 +++ resources/docker/home/bin/prepare_iris.sh | 17 ++++++ 13 files changed, 228 insertions(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/bffd2c78/.dockerignore ---------------------------------------------------------------------- diff --git a/.dockerignore b/.dockerignore new file mode 100644 index 0000000..21e9e02 --- /dev/null +++ b/.dockerignore @@ -0,0 +1,6 @@ +.dockerignore +resources/docker/Dockerfile +resources/docker/docker-compose.yml +.git/ +target/ +.*.swp http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/bffd2c78/docs/gitbook/SUMMARY.md ---------------------------------------------------------------------- diff --git a/docs/gitbook/SUMMARY.md b/docs/gitbook/SUMMARY.md index b31018c..695119a 100644 --- a/docs/gitbook/SUMMARY.md +++ b/docs/gitbook/SUMMARY.md @@ -173,7 +173,11 @@ * [Top-k Join processing](spark/misc/topk_join.md) * [Other utility functions](spark/misc/functions.md) -## Part XIII - External References +## Part XIII - Hivemall on Docker + +* [Getting Started](docker/getting_started.md) + +## Part XIV - External References * [Hivemall on Apache Spark](https://github.com/maropu/hivemall-spark) * [Hivemall on Apache Pig](https://github.com/daijyc/hivemall/wiki/PigHome) http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/bffd2c78/docs/gitbook/docker/getting_started.md ---------------------------------------------------------------------- diff --git a/docs/gitbook/docker/getting_started.md b/docs/gitbook/docker/getting_started.md new file mode 100644 index 0000000..31e14b2 --- /dev/null +++ b/docs/gitbook/docker/getting_started.md @@ -0,0 +1,68 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. +--> + +This page introduces how to run Hivemall on Docker. + +<!-- toc --> + +> #### Caution +> This docker image contains a single-node Hadoop enviroment for evaluating Hivemall. Not suited for production uses. + +# Requirements + + * Docker Engine 1.6+ + * Docker Compose 1.10+ + +# 1. Build image + +## Build using docker-compose + + `docker-compose -f resources/docker/docker-compose.yml build` + +## Build using docker engine + + `docker build -f resources/docker/Dockerfile` + +# 2. Run container + +## Run by docker-compose + + 1. Edit `resources/docker/docker-compose.yml` + 2. `docker-compose -f resources/docker/docker-compose.yml up -d && docker attach hivemall` + +## Run by docker command + + 1. Find a local docker image by `docker images`. + 2. Run `docker run -it ${docker_image_id}`. + Refer [Docker reference](https://docs.docker.com/engine/reference/run/) for the command detail. + +# 3. Run Hivemall on Docker + + 1. type `hive` to run (see `.hiverc` loads Hivemall functions) + 2. Try your Hivemall queries! + +## Load data into HDFS (optional) + + You can find an example script to load data into HDFS in `./bin/prepare_iris.sh`. + The script loads iris dataset into `iris` database. + +## Build Hivemall (optional) + + In the container, Hivemall resource is stored in `$HIVEMALL_PATH`. + You can build Hivemall package by `cd $HIVEMALL_PATH && ./bin/build.sh`. http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/bffd2c78/docs/gitbook/getting_started/installation.md ---------------------------------------------------------------------- diff --git a/docs/gitbook/getting_started/installation.md b/docs/gitbook/getting_started/installation.md index 896d247..ee07afb 100644 --- a/docs/gitbook/getting_started/installation.md +++ b/docs/gitbook/getting_started/installation.md @@ -44,6 +44,17 @@ add jar /tmp/hivemall-core-xxx-with-dependencies.jar; source /tmp/define-all.hive; ``` + +Other choices +============= + +You can also run Hivemall on the following platforms: + +* [Apache Spark](../spark/getting_started/installation.md) +* [Apache Pig](https://github.com/daijyc/hivemall/wiki/PigHome) +* [Apache Hive on Docker](../docker/getting_started.md) for testing + + Build from Source ================== http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/bffd2c78/resources/docker/Dockerfile ---------------------------------------------------------------------- diff --git a/resources/docker/Dockerfile b/resources/docker/Dockerfile new file mode 100644 index 0000000..54b04dc --- /dev/null +++ b/resources/docker/Dockerfile @@ -0,0 +1,63 @@ +FROM openjdk:7 + +WORKDIR /root/ + +ARG PREBUILD=true +ARG HADOOP_VERSION=2.7.3 +ARG HIVE_VERSION=2.1.1 + +ENV BASE_URL='https://www.apache.org/dyn/mirrors/mirrors.cgi?action=download&filename=' +ENV HADOOP_HOME='/usr/local/hadoop' +ENV HIVE_HOME='/usr/local/hive' +ENV HIVEMALL_PATH='/opt/hivemall' +ENV HADOOP_OPTS=' \ + -Dsystem:java.io.tmpdir=/tmp \ + -Dsystem:user.name=root \ + -Dderby.stream.error.file=/root/derby.log' +ENV PATH="${HADOOP_HOME}/bin:${HIVE_HOME}/bin:${PATH}" + +COPY . ${HIVEMALL_PATH}/ + +RUN set -eux && \ + apt update && \ + apt install -y --no-install-recommends openssh-server maven ruby npm && \ + ln -s /usr/bin/nodejs /usr/bin/node && \ + npm install -g gitbook-cli && \ + \ + wget ${BASE_URL}hadoop/common/hadoop-${HADOOP_VERSION}/hadoop-${HADOOP_VERSION}.tar.gz -O - \ + | tar xz && \ + mv hadoop-${HADOOP_VERSION} ${HADOOP_HOME} && \ + sed -i -e 's!${JAVA_HOME}!'"${JAVA_HOME}!" ${HADOOP_HOME}/etc/hadoop/hadoop-env.sh && \ + ssh-keygen -q -P '' -f ~/.ssh/id_rsa && \ + cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys && \ + echo 'host *\n StrictHostKeyChecking no' > ~/.ssh/config && \ + mv ${HIVEMALL_PATH}/resources/docker/etc/hadoop/*.xml ${HADOOP_HOME}/etc/hadoop && \ + hdfs namenode -format && \ + \ + wget ${BASE_URL}hive/hive-${HIVE_VERSION}/apache-hive-${HIVE_VERSION}-bin.tar.gz -O - \ + | tar xz && \ + mv apache-hive-${HIVE_VERSION}-bin ${HIVE_HOME} && \ + cat ${HIVE_HOME}/conf/hive-default.xml.template \ + | sed -e 's!databaseName=metastore_db!databaseName=/root/metastore_db!' \ + > ${HIVE_HOME}/conf/hive-site.xml && \ + \ + cd ${HIVEMALL_PATH} && \ + HIVEMALL_VERSION=`cat VERSION` && \ + mkdir -p /root/bin /root/hivemall && \ + find ${HIVEMALL_PATH}/resources/docker/home/bin -mindepth 1 -maxdepth 1 \ + -exec sh -c 'f={} && ln -s $f /root/bin/${f##*/}' \; && \ + ln -s ${HIVEMALL_PATH}/resources/docker/home/.hiverc /root && \ + ln -s ${HIVEMALL_PATH}/resources/ddl/define-all.hive /root/hivemall/define-all.hive && \ + ln -s ${HIVEMALL_PATH}/target/hivemall-core-${HIVEMALL_VERSION}-with-dependencies.jar \ + /root/hivemall/hivemall-core-with-dependencies.jar && \ + \ + (if ${PREBUILD}; then \ + mvn package -Dmaven.test.skip=true -pl core; \ + fi) && \ + \ + rm -rf /var/cache/apt/archives/* /var/lib/apt/lists/* /root/.m2/* /root/.npm/* + +VOLUME ["/opt/hivemall/", "/root/data/"] +EXPOSE 8088 19888 50070 + +CMD ["sh", "-c", "./bin/init.sh && bash"] http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/bffd2c78/resources/docker/docker-compose.yml ---------------------------------------------------------------------- diff --git a/resources/docker/docker-compose.yml b/resources/docker/docker-compose.yml new file mode 100644 index 0000000..efda053 --- /dev/null +++ b/resources/docker/docker-compose.yml @@ -0,0 +1,19 @@ +version: '2' +services: + hivemall: + build: + context: ../../ + dockerfile: resources/docker/Dockerfile + args: + - PREBUILD=false + image: hivemall + container_name: hivemall + ports: + - "8088:8088" # ResourceManager + - "19888:19888" # JobHistoryServer + - "50070:50070" # NameNode + volumes: + - "../../:/opt/hivemall/" # mount current hivemall dir + #- "/path/to/data/:/root/data/" # mount resources + tty: true + stdin_open: true http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/bffd2c78/resources/docker/etc/hadoop/core-site.xml ---------------------------------------------------------------------- diff --git a/resources/docker/etc/hadoop/core-site.xml b/resources/docker/etc/hadoop/core-site.xml new file mode 100644 index 0000000..1cbd950 --- /dev/null +++ b/resources/docker/etc/hadoop/core-site.xml @@ -0,0 +1,8 @@ +<?xml version="1.0" encoding="UTF-8"?> +<?xml-stylesheet type="text/xsl" href="configuration.xsl"?> +<configuration> + <property> + <name>fs.defaultFS</name> + <value>hdfs://localhost:9000</value> + </property> +</configuration> http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/bffd2c78/resources/docker/etc/hadoop/hdfs-site.xml ---------------------------------------------------------------------- diff --git a/resources/docker/etc/hadoop/hdfs-site.xml b/resources/docker/etc/hadoop/hdfs-site.xml new file mode 100644 index 0000000..98c8849 --- /dev/null +++ b/resources/docker/etc/hadoop/hdfs-site.xml @@ -0,0 +1,8 @@ +<?xml version="1.0" encoding="UTF-8"?> +<?xml-stylesheet type="text/xsl" href="configuration.xsl"?> +<configuration> + <property> + <name>dfs.replication</name> + <value>1</value> + </property> +</configuration> http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/bffd2c78/resources/docker/etc/hadoop/mapred-site.xml ---------------------------------------------------------------------- diff --git a/resources/docker/etc/hadoop/mapred-site.xml b/resources/docker/etc/hadoop/mapred-site.xml new file mode 100644 index 0000000..a115f99 --- /dev/null +++ b/resources/docker/etc/hadoop/mapred-site.xml @@ -0,0 +1,7 @@ +<?xml version="1.0"?> +<configuration> + <property> + <name>mapreduce.framework.name</name> + <value>yarn</value> + </property> +</configuration> http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/bffd2c78/resources/docker/etc/hadoop/yarn-site.xml ---------------------------------------------------------------------- diff --git a/resources/docker/etc/hadoop/yarn-site.xml b/resources/docker/etc/hadoop/yarn-site.xml new file mode 100644 index 0000000..98b3c50 --- /dev/null +++ b/resources/docker/etc/hadoop/yarn-site.xml @@ -0,0 +1,7 @@ +<?xml version="1.0"?> +<configuration> + <property> + <name>yarn.nodemanager.aux-services</name> + <value>mapreduce_shuffle</value> + </property> +</configuration> http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/bffd2c78/resources/docker/home/.hiverc ---------------------------------------------------------------------- diff --git a/resources/docker/home/.hiverc b/resources/docker/home/.hiverc new file mode 100644 index 0000000..0030939 --- /dev/null +++ b/resources/docker/home/.hiverc @@ -0,0 +1,2 @@ +add jar /root/hivemall/hivemall-core-with-dependencies.jar; +source /root/hivemall/define-all.hive; http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/bffd2c78/resources/docker/home/bin/init.sh ---------------------------------------------------------------------- diff --git a/resources/docker/home/bin/init.sh b/resources/docker/home/bin/init.sh new file mode 100755 index 0000000..c9dc180 --- /dev/null +++ b/resources/docker/home/bin/init.sh @@ -0,0 +1,7 @@ +#!/bin/sh -eux + +/etc/init.d/ssh start +$HADOOP_HOME/sbin/start-dfs.sh +$HADOOP_HOME/sbin/start-yarn.sh +$HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver +schematool -initSchema -dbType derby http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/bffd2c78/resources/docker/home/bin/prepare_iris.sh ---------------------------------------------------------------------- diff --git a/resources/docker/home/bin/prepare_iris.sh b/resources/docker/home/bin/prepare_iris.sh new file mode 100755 index 0000000..944de08 --- /dev/null +++ b/resources/docker/home/bin/prepare_iris.sh @@ -0,0 +1,17 @@ +#!/bin/sh -eux + +DATA_DIR='/root/data' +HDFS_DATA_DIR='/dataset/iris/raw' +DATA='iris.data' +mkdir -p $DATA_DIR +[ -f $DATA_DIR/$DATA ] || wget http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data -O $DATA_DIR/$DATA +hadoop fs -mkdir -p $HDFS_DATA_DIR +awk -F',' 'NF >0 {OFS="|"; print NR,$5,$1","$2","$3","$4}' $DATA_DIR/$DATA \ + | hadoop fs -put - $HDFS_DATA_DIR/$DATA +hive -e " \ + create database if not exists iris; \ + use iris; \ + create external table iris_raw (rowid int, label string, features array<float>) \ + row format delimited fields terminated by '|' \ + collection items terminated by ',' \ + stored as textfile location \"$HDFS_DATA_DIR\";"