griffin git commit: Improve deployment manual guide

guoyp Mon, 24 Dec 2018 00:17:35 -0800

Repository: griffin
Updated Branches:
  refs/heads/master f2292ac74 -> 8099cb3bf



Improve deployment manual guide

1.append detailed step's description
2.fix invalid stuffs

Author: Eugene <[email protected]>

Closes #473 from toyboxman/pr-doc.


Project: http://git-wip-us.apache.org/repos/asf/griffin/repo
Commit: http://git-wip-us.apache.org/repos/asf/griffin/commit/8099cb3b
Tree: http://git-wip-us.apache.org/repos/asf/griffin/tree/8099cb3b
Diff: http://git-wip-us.apache.org/repos/asf/griffin/diff/8099cb3b

Branch: refs/heads/master
Commit: 8099cb3bf51a716bf996b3c0583336fc60b0fd86
Parents: f2292ac
Author: Eugene <[email protected]>
Authored: Mon Dec 24 16:16:51 2018 +0800
Committer: William Guo <[email protected]>
Committed: Mon Dec 24 16:16:51 2018 +0800

----------------------------------------------------------------------
 griffin-doc/deploy/deploy-guide.md | 266 +++++++++++++++++++++++++++++---
 1 file changed, 248 insertions(+), 18 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/griffin/blob/8099cb3b/griffin-doc/deploy/deploy-guide.md
----------------------------------------------------------------------
diff --git a/griffin-doc/deploy/deploy-guide.md 
b/griffin-doc/deploy/deploy-guide.md
index b9f7ceb..7327b15 100644
--- a/griffin-doc/deploy/deploy-guide.md
+++ b/griffin-doc/deploy/deploy-guide.md
@@ -21,23 +21,50 @@ under the License.
 For Apache Griffin users, please follow the instructions below to deploy 
Apache Griffin in your environment. Note that there are some dependencies that 
should be installed firstly.
 
 ### Prerequisites
-You need to install following items
-- JDK (1.8 or later versions).
-- PostgreSQL(version 10.4) or MySQL(version 8.0.11).
-- npm (version 6.0.0+).
-- 
[Hadoop](http://apache.claz.org/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz) 
(2.6.0 or later), you can get some help 
[here](https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SingleCluster.html).
+Firstly you need to install and configure following software products, here we 
use [ubuntu-18.10](https://www.ubuntu.com/download) as sample OS to prepare all 
dependencies.
+```bash
+# put all download packages into /apache folder
+$ mkdir /home/user/software
+$ sudo ln -s /home/user/software /apache
+$ sudo ln -s /apache/data /data
+```
+
+- JDK (1.8 or later versions)
+```bash
+$ sudo apt install openjdk-8-jre-headless
+
+$ java -version
+openjdk version "1.8.0_191"
+OpenJDK Runtime Environment (build 1.8.0_191-8u191-b12-0ubuntu0.18.10.1-b12)
+OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)
+```
+
+- PostgreSQL(version 10.4) or MySQL(version 8.0.11)
+```bash
+# PostgreSQL
+$ sudo apt install postgresql-10
+
+# MySQL
+$ sudo apt install mysql-server-5.7
+```
+
+- [npm](https://nodejs.org/en/download/) (version 6.0.0+)
+```bash
+$ sudo apt install nodejs
+$ sudo apt install npm
+$ node -v
+$ npm -v
+```
+
+- [Hadoop](http://apache.claz.org/hadoop/common/) (2.6.0 or later), you can 
get some help 
[here](https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SingleCluster.html).
+
+- [Hive](http://apache.claz.org/hive/) (version 2.x), you can get some help 
[here](https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-RunningHive).
+
 - [Spark](http://spark.apache.org/downloads.html) (version 2.2.1), if you want 
to install Pseudo Distributed/Single Node Cluster, you can get some help 
[here](http://why-not-learn-something.blogspot.com/2015/06/spark-installation-pseudo.html).
-- [Hive](http://apache.claz.org/hive/hive-2.2.0/apache-hive-2.2.0-bin.tar.gz) 
(version 2.2.0), you can get some help 
[here](https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-RunningHive).
-    You need to make sure that your spark cluster could access your 
HiveContext.
+
 - [Livy](http://archive.cloudera.com/beta/livy/livy-server-0.3.0.zip), you can 
get some help [here](http://livy.io/quickstart.html).
-    Apache Griffin need to schedule spark jobs by server, we use livy to 
submit our jobs.
-    For some issues of Livy for HiveContext, we need to download 3 files or 
get them from Spark lib `$SPARK_HOME/lib/`, and put them into HDFS.
-    ```
-    datanucleus-api-jdo-3.2.6.jar
-    datanucleus-core-3.2.10.jar
-    datanucleus-rdbms-3.2.9.jar
-    ```
-- ElasticSearch (5.0 or later versions).
+
+- [ElasticSearch](https://www.elastic.co/downloads/elasticsearch) (5.0 or 
later versions).
        ElasticSearch works as a metrics collector, Apache Griffin produces 
metrics into it, and our default UI gets metrics from it, you can use them by 
your own way as well.
 
 ### Configuration
@@ -59,9 +86,213 @@ Create database 'quartz' in MySQL
 ```
 mysql -u <username> -e "create database quartz" -p
 ```
-Init quartz tables in MySQL using 
[Init_quartz_mysql_innodb.sql.sql](../../service/src/main/resources/Init_quartz_mysql_innodb.sql)
+Init quartz tables in MySQL using 
[Init_quartz_mysql_innodb.sql](../../service/src/main/resources/Init_quartz_mysql_innodb.sql)
 ```
-mysql -u <username> -p quartz < Init_quartz_mysql_innodb.sql.sql
+mysql -u <username> -p quartz < Init_quartz_mysql_innodb.sql
+```
+
+#### Set Env
+
+export those variables below, or create hadoop_env.sh and put it into .bashrc
+```bash
+#!/bin/bash
+export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+
+export HADOOP_HOME=/apache/hadoop
+export HADOOP_COMMON_HOME=/apache/hadoop
+export HADOOP_COMMON_LIB_NATIVE_DIR=/apache/hadoop/lib/native
+export HADOOP_HDFS_HOME=/apache/hadoop
+export HADOOP_INSTALL=/apache/hadoop
+export HADOOP_MAPRED_HOME=/apache/hadoop
+export HADOOP_USER_CLASSPATH_FIRST=true
+export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
+export SPARK_HOME=/apache/spark
+export LIVY_HOME=/apache/livy
+export HIVE_HOME=/apache/hive
+export YARN_HOME=/apache/hadoop
+export SCALA_HOME=/apache/scala
+```
+
+#### Hadoop
+
+* **update configuration**
+
+here are sample configurations for hadoop<br>
+Put site-specific property overrides in this file 
**/apache/hadoop/etc/hadoop/core-site.xml**
+```xml
+<configuration>
+    <name>fs.defaultFS</name>
+    <value>hdfs://127.0.0.1:9000</value>
+</configuration>
+```
+
+Put site-specific property overrides in this file 
**/apache/hadoop/etc/hadoop/hdfs-site.xml**
+```xml
+<configuration>
+    <property>
+        <name>dfs.namenode.logging.level</name>
+        <value>warn</value>
+    </property>
+    <property>
+        <name>dfs.replication</name>
+        <value>1</value>
+    </property>
+    <property>
+        <name>dfs.namenode.servicerpc-address</name>
+        <value>127.0.0.1:9001</value>
+    </property>
+    <property>
+        <name>dfs.namenode.rpc-address</name>
+        <value>127.0.0.1:9002</value>
+    </property>
+    <property>
+        <name>dfs.namenode.name.dir</name>
+        <value>file:///data/hadoop-data/nn</value>
+    </property>
+    <property>
+        <name>dfs.datanode.data.dir</name>
+        <value>file:///data/hadoop-data/dn</value>
+    </property>
+    <property>
+        <name>dfs.namenode.checkpoint.dir</name>
+        <value>file:///data/hadoop-data/snn</value>
+    </property>
+    <property>
+        <name>dfs.webhdfs.enabled</name>
+        <value>true</value>
+    </property>
+    <property>
+        <name>dfs.datanode.use.datanode.hostname</name>
+        <value>false</value>
+    </property>
+    <property>
+        <name>dfs.namenode.datanode.registration.ip-hostname-check</name>
+        <value>false</value>
+    </property>
+</configuration>
+```
+
+* **start/stop hadoop nodes**
+```bash
+# format name node
+/apache/hadoop/bin/hdfs namenode -format
+# start namenode/datanode
+/apache/hadoop/sbin/start-dfs.sh
+# stop all nodes
+/apache/hadoop/sbin/stop-all.sh
+```
+* **start/stop hadoop ResourceManager**
+```bash
+# manually clear the ResourceManager state store
+/apache/hadoop/bin/yarn resourcemanager -format-state-store
+# startup the ResourceManager
+/apache/hadoop/sbin/yarn-daemon.sh start resourcemanager
+# stop the ResourceManager
+/apache/hadoop/sbin/yarn-daemon.sh stop resourcemanager
+```
+* **start/stop hadoop NodeManager**
+```bash
+# startup the NodeManager
+/apache/hadoop/sbin/yarn-daemon.sh start nodemanager
+# stop the NodeManager
+/apache/hadoop/sbin/yarn-daemon.sh stop nodemanager
+```
+* **start/stop hadoop HistoryServer**
+```bash
+# startup the HistoryServer
+/apache/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver
+# stop the HistoryServer
+/apache/hadoop/sbin/mr-jobhistory-daemon.sh stop historyserver
+```
+
+#### Hive
+You need to make sure that your spark cluster could access your HiveContext.
+* **update configuration**
+Copy hive/conf/hive-site.xml.template to hive/conf/hive-site.xml and update 
some fields.
+```xml
++++ hive/conf/hive-site.xml    2018-12-16 11:17:51.000000000 +0800
+@@ -368,7 +368,7 @@
+   </property>
+   <property>
+     <name>hive.metastore.uris</name>
+-    <value/>
++    <value>thrift://127.0.0.1:9083</value>
+     <description>Thrift URI for the remote metastore. Used by metastore 
client to connect to remote metastore.</description>
+   </property>
+   <property>
+@@ -527,7 +527,7 @@
+   </property>
+   <property>
+     <name>javax.jdo.option.ConnectionPassword</name>
+-    <value>mine</value>
++    <value>secret</value>
+     <description>password to use against metastore database</description>
+   </property>
+   <property>
+@@ -542,7 +542,7 @@
+   </property>
+   <property>
+     <name>javax.jdo.option.ConnectionURL</name>
+-    <value>jdbc:derby:;databaseName=metastore_db;create=true</value>
++    <value>jdbc:postgresql://127.0.0.1/myDB?ssl=false</value>
+     <description>
+       JDBC connect string for a JDBC metastore.
+       To use SSL to encrypt/authenticate the connection, provide 
database-specific SSL flag in the connection URL.
+@@ -1017,7 +1017,7 @@
+   </property>
+   <property>
+     <name>javax.jdo.option.ConnectionDriverName</name>
+-    <value>org.apache.derby.jdbc.EmbeddedDriver</value>
++    <value>org.postgresql.Driver</value>
+     <description>Driver class name for a JDBC metastore</description>
+   </property>
+   <property>
+@@ -1042,7 +1042,7 @@
+   </property>
+   <property>
+     <name>javax.jdo.option.ConnectionUserName</name>
+-    <value>APP</value>
++    <value>king</value>
+     <description>Username to use against metastore database</description>
+   </property>
+   <property>
+```
+
+* **start up hive metastore service**
+```bash
+# start hive metastore service
+/apache/hive/bin/hive --service metastore
+```
+
+#### Spark
+* **start up spark nodes**
+```bash
+cp /apache/hive/conf/hive-site.xml /apache/spark/conf/
+/apache/spark/sbin/start-master.sh
+/apache/spark/sbin/start-slave.sh  spark://localhost:7077
+```
+
+#### Livy
+Apache Griffin need to schedule spark jobs by server, we use livy to submit 
our jobs.
+For some issues of Livy for HiveContext, we need to download 3 files or get 
them from Spark lib `$SPARK_HOME/lib/`, and put them into HDFS.
+```
+datanucleus-api-jdo-3.2.6.jar
+datanucleus-core-3.2.10.jar
+datanucleus-rdbms-3.2.9.jar
+```
+* **update configuration**
+```bash
+mkdir livy/logs
+
+# update livy/conf/livy.conf
+livy.server.host = 127.0.0.1
+livy.spark.master = yarn
+livy.spark.deployMode = cluster
+livy.repl.enableHiveContext = true
+```
+* **start up livy**
+```bash
+/apache/livy/LivyServer
 ```
 
 #### Elasticsearch
@@ -98,7 +329,6 @@ curl -XPUT http://es:9200/griffin -d '
 }
 '
 ```
-
 You should also modify some configurations of Apache Griffin for your 
environment.
 
 - <b>service/src/main/resources/application.properties</b>

griffin git commit: Improve deployment manual guide

Reply via email to