Repository: griffin Updated Branches: refs/heads/master f2292ac74 -> 8099cb3bf
Improve deployment manual guide 1.append detailed step's description 2.fix invalid stuffs Author: Eugene <[email protected]> Closes #473 from toyboxman/pr-doc. Project: http://git-wip-us.apache.org/repos/asf/griffin/repo Commit: http://git-wip-us.apache.org/repos/asf/griffin/commit/8099cb3b Tree: http://git-wip-us.apache.org/repos/asf/griffin/tree/8099cb3b Diff: http://git-wip-us.apache.org/repos/asf/griffin/diff/8099cb3b Branch: refs/heads/master Commit: 8099cb3bf51a716bf996b3c0583336fc60b0fd86 Parents: f2292ac Author: Eugene <[email protected]> Authored: Mon Dec 24 16:16:51 2018 +0800 Committer: William Guo <[email protected]> Committed: Mon Dec 24 16:16:51 2018 +0800 ---------------------------------------------------------------------- griffin-doc/deploy/deploy-guide.md | 266 +++++++++++++++++++++++++++++--- 1 file changed, 248 insertions(+), 18 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/griffin/blob/8099cb3b/griffin-doc/deploy/deploy-guide.md ---------------------------------------------------------------------- diff --git a/griffin-doc/deploy/deploy-guide.md b/griffin-doc/deploy/deploy-guide.md index b9f7ceb..7327b15 100644 --- a/griffin-doc/deploy/deploy-guide.md +++ b/griffin-doc/deploy/deploy-guide.md @@ -21,23 +21,50 @@ under the License. For Apache Griffin users, please follow the instructions below to deploy Apache Griffin in your environment. Note that there are some dependencies that should be installed firstly. ### Prerequisites -You need to install following items -- JDK (1.8 or later versions). -- PostgreSQL(version 10.4) or MySQL(version 8.0.11). -- npm (version 6.0.0+). -- [Hadoop](http://apache.claz.org/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz) (2.6.0 or later), you can get some help [here](https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SingleCluster.html). +Firstly you need to install and configure following software products, here we use [ubuntu-18.10](https://www.ubuntu.com/download) as sample OS to prepare all dependencies. +```bash +# put all download packages into /apache folder +$ mkdir /home/user/software +$ sudo ln -s /home/user/software /apache +$ sudo ln -s /apache/data /data +``` + +- JDK (1.8 or later versions) +```bash +$ sudo apt install openjdk-8-jre-headless + +$ java -version +openjdk version "1.8.0_191" +OpenJDK Runtime Environment (build 1.8.0_191-8u191-b12-0ubuntu0.18.10.1-b12) +OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode) +``` + +- PostgreSQL(version 10.4) or MySQL(version 8.0.11) +```bash +# PostgreSQL +$ sudo apt install postgresql-10 + +# MySQL +$ sudo apt install mysql-server-5.7 +``` + +- [npm](https://nodejs.org/en/download/) (version 6.0.0+) +```bash +$ sudo apt install nodejs +$ sudo apt install npm +$ node -v +$ npm -v +``` + +- [Hadoop](http://apache.claz.org/hadoop/common/) (2.6.0 or later), you can get some help [here](https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SingleCluster.html). + +- [Hive](http://apache.claz.org/hive/) (version 2.x), you can get some help [here](https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-RunningHive). + - [Spark](http://spark.apache.org/downloads.html) (version 2.2.1), if you want to install Pseudo Distributed/Single Node Cluster, you can get some help [here](http://why-not-learn-something.blogspot.com/2015/06/spark-installation-pseudo.html). -- [Hive](http://apache.claz.org/hive/hive-2.2.0/apache-hive-2.2.0-bin.tar.gz) (version 2.2.0), you can get some help [here](https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-RunningHive). - You need to make sure that your spark cluster could access your HiveContext. + - [Livy](http://archive.cloudera.com/beta/livy/livy-server-0.3.0.zip), you can get some help [here](http://livy.io/quickstart.html). - Apache Griffin need to schedule spark jobs by server, we use livy to submit our jobs. - For some issues of Livy for HiveContext, we need to download 3 files or get them from Spark lib `$SPARK_HOME/lib/`, and put them into HDFS. - ``` - datanucleus-api-jdo-3.2.6.jar - datanucleus-core-3.2.10.jar - datanucleus-rdbms-3.2.9.jar - ``` -- ElasticSearch (5.0 or later versions). + +- [ElasticSearch](https://www.elastic.co/downloads/elasticsearch) (5.0 or later versions). ElasticSearch works as a metrics collector, Apache Griffin produces metrics into it, and our default UI gets metrics from it, you can use them by your own way as well. ### Configuration @@ -59,9 +86,213 @@ Create database 'quartz' in MySQL ``` mysql -u <username> -e "create database quartz" -p ``` -Init quartz tables in MySQL using [Init_quartz_mysql_innodb.sql.sql](../../service/src/main/resources/Init_quartz_mysql_innodb.sql) +Init quartz tables in MySQL using [Init_quartz_mysql_innodb.sql](../../service/src/main/resources/Init_quartz_mysql_innodb.sql) ``` -mysql -u <username> -p quartz < Init_quartz_mysql_innodb.sql.sql +mysql -u <username> -p quartz < Init_quartz_mysql_innodb.sql +``` + +#### Set Env + +export those variables below, or create hadoop_env.sh and put it into .bashrc +```bash +#!/bin/bash +export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + +export HADOOP_HOME=/apache/hadoop +export HADOOP_COMMON_HOME=/apache/hadoop +export HADOOP_COMMON_LIB_NATIVE_DIR=/apache/hadoop/lib/native +export HADOOP_HDFS_HOME=/apache/hadoop +export HADOOP_INSTALL=/apache/hadoop +export HADOOP_MAPRED_HOME=/apache/hadoop +export HADOOP_USER_CLASSPATH_FIRST=true +export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop +export SPARK_HOME=/apache/spark +export LIVY_HOME=/apache/livy +export HIVE_HOME=/apache/hive +export YARN_HOME=/apache/hadoop +export SCALA_HOME=/apache/scala +``` + +#### Hadoop + +* **update configuration** + +here are sample configurations for hadoop<br> +Put site-specific property overrides in this file **/apache/hadoop/etc/hadoop/core-site.xml** +```xml +<configuration> + <name>fs.defaultFS</name> + <value>hdfs://127.0.0.1:9000</value> +</configuration> +``` + +Put site-specific property overrides in this file **/apache/hadoop/etc/hadoop/hdfs-site.xml** +```xml +<configuration> + <property> + <name>dfs.namenode.logging.level</name> + <value>warn</value> + </property> + <property> + <name>dfs.replication</name> + <value>1</value> + </property> + <property> + <name>dfs.namenode.servicerpc-address</name> + <value>127.0.0.1:9001</value> + </property> + <property> + <name>dfs.namenode.rpc-address</name> + <value>127.0.0.1:9002</value> + </property> + <property> + <name>dfs.namenode.name.dir</name> + <value>file:///data/hadoop-data/nn</value> + </property> + <property> + <name>dfs.datanode.data.dir</name> + <value>file:///data/hadoop-data/dn</value> + </property> + <property> + <name>dfs.namenode.checkpoint.dir</name> + <value>file:///data/hadoop-data/snn</value> + </property> + <property> + <name>dfs.webhdfs.enabled</name> + <value>true</value> + </property> + <property> + <name>dfs.datanode.use.datanode.hostname</name> + <value>false</value> + </property> + <property> + <name>dfs.namenode.datanode.registration.ip-hostname-check</name> + <value>false</value> + </property> +</configuration> +``` + +* **start/stop hadoop nodes** +```bash +# format name node +/apache/hadoop/bin/hdfs namenode -format +# start namenode/datanode +/apache/hadoop/sbin/start-dfs.sh +# stop all nodes +/apache/hadoop/sbin/stop-all.sh +``` +* **start/stop hadoop ResourceManager** +```bash +# manually clear the ResourceManager state store +/apache/hadoop/bin/yarn resourcemanager -format-state-store +# startup the ResourceManager +/apache/hadoop/sbin/yarn-daemon.sh start resourcemanager +# stop the ResourceManager +/apache/hadoop/sbin/yarn-daemon.sh stop resourcemanager +``` +* **start/stop hadoop NodeManager** +```bash +# startup the NodeManager +/apache/hadoop/sbin/yarn-daemon.sh start nodemanager +# stop the NodeManager +/apache/hadoop/sbin/yarn-daemon.sh stop nodemanager +``` +* **start/stop hadoop HistoryServer** +```bash +# startup the HistoryServer +/apache/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver +# stop the HistoryServer +/apache/hadoop/sbin/mr-jobhistory-daemon.sh stop historyserver +``` + +#### Hive +You need to make sure that your spark cluster could access your HiveContext. +* **update configuration** +Copy hive/conf/hive-site.xml.template to hive/conf/hive-site.xml and update some fields. +```xml ++++ hive/conf/hive-site.xml 2018-12-16 11:17:51.000000000 +0800 +@@ -368,7 +368,7 @@ + </property> + <property> + <name>hive.metastore.uris</name> +- <value/> ++ <value>thrift://127.0.0.1:9083</value> + <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description> + </property> + <property> +@@ -527,7 +527,7 @@ + </property> + <property> + <name>javax.jdo.option.ConnectionPassword</name> +- <value>mine</value> ++ <value>secret</value> + <description>password to use against metastore database</description> + </property> + <property> +@@ -542,7 +542,7 @@ + </property> + <property> + <name>javax.jdo.option.ConnectionURL</name> +- <value>jdbc:derby:;databaseName=metastore_db;create=true</value> ++ <value>jdbc:postgresql://127.0.0.1/myDB?ssl=false</value> + <description> + JDBC connect string for a JDBC metastore. + To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL. +@@ -1017,7 +1017,7 @@ + </property> + <property> + <name>javax.jdo.option.ConnectionDriverName</name> +- <value>org.apache.derby.jdbc.EmbeddedDriver</value> ++ <value>org.postgresql.Driver</value> + <description>Driver class name for a JDBC metastore</description> + </property> + <property> +@@ -1042,7 +1042,7 @@ + </property> + <property> + <name>javax.jdo.option.ConnectionUserName</name> +- <value>APP</value> ++ <value>king</value> + <description>Username to use against metastore database</description> + </property> + <property> +``` + +* **start up hive metastore service** +```bash +# start hive metastore service +/apache/hive/bin/hive --service metastore +``` + +#### Spark +* **start up spark nodes** +```bash +cp /apache/hive/conf/hive-site.xml /apache/spark/conf/ +/apache/spark/sbin/start-master.sh +/apache/spark/sbin/start-slave.sh spark://localhost:7077 +``` + +#### Livy +Apache Griffin need to schedule spark jobs by server, we use livy to submit our jobs. +For some issues of Livy for HiveContext, we need to download 3 files or get them from Spark lib `$SPARK_HOME/lib/`, and put them into HDFS. +``` +datanucleus-api-jdo-3.2.6.jar +datanucleus-core-3.2.10.jar +datanucleus-rdbms-3.2.9.jar +``` +* **update configuration** +```bash +mkdir livy/logs + +# update livy/conf/livy.conf +livy.server.host = 127.0.0.1 +livy.spark.master = yarn +livy.spark.deployMode = cluster +livy.repl.enableHiveContext = true +``` +* **start up livy** +```bash +/apache/livy/LivyServer ``` #### Elasticsearch @@ -98,7 +329,6 @@ curl -XPUT http://es:9200/griffin -d ' } ' ``` - You should also modify some configurations of Apache Griffin for your environment. - <b>service/src/main/resources/application.properties</b>
