[submarine] branch master updated: SUBMARINE-322. Add yarn runtime examples in installation guide.

zhouquan Tue, 31 Dec 2019 03:05:48 -0800

This is an automated email from the ASF dual-hosted git repository.

zhouquan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/submarine.git



The following commit(s) were added to refs/heads/master by this push:
     new 7e5fb05  SUBMARINE-322. Add yarn runtime examples in installation 
guide.
7e5fb05 is described below

commit 7e5fb05e1eb3489e614cffae5c62827e2143df94
Author: Zac Zhou <[email protected]>
AuthorDate: Fri Dec 27 21:17:34 2019 +0800

    SUBMARINE-322. Add yarn runtime examples in installation guide.
    
    ### What is this PR for?
    Instead of yarn native service, yarn runtime is used by default since 
submarine 0.3.0,
    The guide of how to a run yarn runtime job should be added in the submarine 
installation guide.
    
    ### What type of PR is it?
    Documentation
    
    ### What is the Jira issue?
    https://issues.apache.org/jira/browse/SUBMARINE-322
    
    ### How should this be tested?
    https://travis-ci.org/yuanzac/hadoop-submarine/builds/630008273
    
    ### Screenshots (if appropriate)
    
    ### Questions:
    * Does the licenses files need update? No
    * Is there breaking changes for older versions? No
    * Does this needs documentation? Yes
    
    Author: Zac Zhou <[email protected]>
    
    Closes #134 from yuanzac/topic/SUBMARINE-322 and squashes the following 
commits:
    
    38b3b73 [Zac Zhou] Submarine-322. Add yarn runtime examples in installation 
guide.
    e54c4a4 [Zac Zhou] SUBMARINE-322. Add yarn runtime examples in installation 
guide.
---
 .../submarine/run_submarine_mnist_tony.sh          |   7 +-
 .../submarine-installer/InstallationGuide.md       | 707 -----------------
 .../InstallationGuideChineseVersion.md             | 845 ---------------------
 dev-support/submarine-installer/README-CN.md       |   8 +-
 dev-support/submarine-installer/README.md          |  10 +-
 docs/README.md                                     |   4 +-
 docs/helper/InstallationGuide.md                   | 365 ++++-----
 docs/helper/InstallationGuideChineseVersion.md     | 367 ++++-----
 docs/helper/QuickStart.md                          |   2 +-
 ...ningDistributedCifar10TFJobsWithYarnService.md} |  31 +-
 ...nningSingleNodeCifar10PTJobsWithYarnService.md} |  16 +-
 .../ubuntu-16.04/Dockerfile.gpu.pytorch_latest     |   0
 .../helper}/docker/pytorch/build-all.sh            |   0
 .../with-cifar10-models/cifar10_tutorial.py        |   0
 .../ubuntu-16.04/Dockerfile.gpu.pytorch_latest     |   0
 .../base/ubuntu-16.04/Dockerfile.cpu.tf_1.13.1     |   0
 .../base/ubuntu-16.04/Dockerfile.gpu.tf_1.13.1     |   0
 .../helper}/docker/tensorflow/build-all.sh         |   0
 .../ubuntu-16.04/Dockerfile.cpu.tf_1.13.1          |   0
 .../ubuntu-16.04/Dockerfile.gpu.tf_1.13.1          |   0
 .../cifar10_estimator_tf_1.13.1/README.md          |   0
 .../cifar10_estimator_tf_1.13.1/cifar10.py         |   0
 .../cifar10_estimator_tf_1.13.1/cifar10_main.py    |   0
 .../cifar10_estimator_tf_1.13.1/cifar10_model.py   |   0
 .../cifar10_estimator_tf_1.13.1/cifar10_utils.py   |   0
 .../generate_cifar10_tfrecords.py                  |   0
 .../cifar10_estimator_tf_1.13.1/model_base.py      |   0
 .../zeppelin-notebook-example/Dockerfile.gpu       |   0
 .../zeppelin-notebook-example/run_container.sh     |   0
 .../tensorflow/zeppelin-notebook-example/shiro.ini |   0
 .../zeppelin-notebook-example/zeppelin-site.xml    |   0
 31 files changed, 428 insertions(+), 1934 deletions(-)

diff --git a/dev-support/mini-submarine/submarine/run_submarine_mnist_tony.sh 
b/dev-support/mini-submarine/submarine/run_submarine_mnist_tony.sh
index 61cc3f1..f4d9dbd 100755
--- a/dev-support/mini-submarine/submarine/run_submarine_mnist_tony.sh
+++ b/dev-support/mini-submarine/submarine/run_submarine_mnist_tony.sh
@@ -49,8 +49,11 @@ fi
 
 SUBMARINE_VERSION=0.3.0-SNAPSHOT
 HADOOP_VERSION=2.9
+SUBMARINE_PATH=/opt/submarine-current
+HADOOP_CONF_PATH=/usr/local/hadoop/etc/hadoop
+MNIST_PATH=/home/yarn/submarine
 
-${JAVA_CMD} -cp 
/opt/submarine-current/submarine-all-${SUBMARINE_VERSION}-hadoop-${HADOOP_VERSION}.jar:/usr/local/hadoop/etc/hadoop
 \
+${JAVA_CMD} -cp 
${SUBMARINE_PATH}/submarine-all-${SUBMARINE_VERSION}-hadoop-${HADOOP_VERSION}.jar:${HADOOP_CONF_PATH}
 \
  org.apache.submarine.client.cli.Cli job run --name tf-job-001 \
  --framework tensorflow \
  --verbose \
@@ -62,4 +65,4 @@ ${JAVA_CMD} -cp 
/opt/submarine-current/submarine-all-${SUBMARINE_VERSION}-hadoop
  --worker_launch_cmd "${WORKER_CMD}" \
  --ps_launch_cmd "myvenv.zip/venv/bin/python mnist_distributed.py --steps 2 
--data_dir /tmp/data --working_dir /tmp/mode" \
  --insecure \
- --conf 
tony.containers.resources=/home/yarn/submarine/myvenv.zip#archive,/home/yarn/submarine/mnist_distributed.py,/opt/submarine-current/submarine-all-${SUBMARINE_VERSION}-hadoop-${HADOOP_VERSION}.jar
+ --conf 
tony.containers.resources=${MNIST_PATH}/myvenv.zip#archive,${MNIST_PATH}/mnist_distributed.py,${SUBMARINE_PATH}/submarine-all-${SUBMARINE_VERSION}-hadoop-${HADOOP_VERSION}.jar
diff --git a/dev-support/submarine-installer/InstallationGuide.md 
b/dev-support/submarine-installer/InstallationGuide.md
deleted file mode 100644
index 0e3a328..0000000
--- a/dev-support/submarine-installer/InstallationGuide.md
+++ /dev/null
@@ -1,707 +0,0 @@
-# Submarine Installation Guide
-
-## Prerequisites
-
-(Please note that all following prerequisites are just an example for you to 
install. You can always choose to install your own version of kernel, different 
users, different drivers, etc.).
-
-### Operating System
-
-The operating system and kernel versions we have tested are as shown in the 
following table, which is the recommneded minimum required versions.
-
-| Enviroment | Verion |
-| ------ | ------ |
-| Operating System | centos-release-7-3.1611.el7.centos.x86_64 |
-| Kernal | 3.10.0-514.el7.x86_64 |
-
-### User & Group
-
-As there are some specific users and groups recommended to be created to 
install hadoop/docker. Please create them if they are missing.
-
-```
-adduser hdfs
-adduser mapred
-adduser yarn
-addgroup hadoop
-usermod -aG hdfs,hadoop hdfs
-usermod -aG mapred,hadoop mapred
-usermod -aG yarn,hadoop yarn
-usermod -aG hdfs,hadoop hadoop
-groupadd docker
-usermod -aG docker yarn
-usermod -aG docker hadoop
-```
-
-### GCC Version
-
-Check the version of GCC tool (to compile kernel).
-
-```bash
-gcc --version
-gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11)
-# install if needed
-yum install gcc make g++
-```
-
-### Kernel header & Kernel devel
-
-```bash
-# Approach 1：
-yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r)
-# Approach 2：
-wget 
http://vault.centos.org/7.3.1611/os/x86_64/Packages/kernel-headers-3.10.0-514.el7.x86_64.rpm
-rpm -ivh kernel-headers-3.10.0-514.el7.x86_64.rpm
-```
-
-### GPU Servers (Only for Nvidia GPU equipped nodes)
-
-```
-lspci | grep -i nvidia
-
-# If the server has gpus, you can get info like this：
-04:00.0 3D controller: NVIDIA Corporation Device 1b38 (rev a1)
-82:00.0 3D controller: NVIDIA Corporation Device 1b38 (rev a1)
-```
-
-
-
-### Nvidia Driver Installation (Only for Nvidia GPU equipped nodes)
-
-To make a clean installation, if you have requirements to upgrade GPU drivers. 
If nvidia driver/cuda has been installed before, They should be uninstalled 
firstly.
-
-```
-# uninstall cuda：
-sudo /usr/local/cuda-10.0/bin/uninstall_cuda_10.0.pl
-
-# uninstall nvidia-driver：
-sudo /usr/bin/nvidia-uninstall
-```
-
-To check GPU version, install nvidia-detect
-
-```
-yum install nvidia-detect
-# run 'nvidia-detect -v' to get reqired nvidia driver version：
-nvidia-detect -v
-Probing for supported NVIDIA devices...
-[10de:13bb] NVIDIA Corporation GM107GL [Quadro K620]
-This device requires the current xyz.nm NVIDIA driver kmod-nvidia
-[8086:1912] Intel Corporation HD Graphics 530
-An Intel display controller was also detected
-```
-
-Pay attention to `This device requires the current xyz.nm NVIDIA driver 
kmod-nvidia`.
-Download the installer like 
[NVIDIA-Linux-x86_64-390.87.run](https://www.nvidia.com/object/linux-amd64-display-archive.html).
-
-
-Some preparatory work for nvidia driver installation. (This is follow normal 
Nvidia GPU driver installation, just put here for your convenience)
-
-```
-# It may take a while to update
-yum -y update
-yum -y install kernel-devel
-
-yum -y install epel-release
-yum -y install dkms
-
-# Disable nouveau
-vim /etc/default/grub
-# Add the following configuration in “GRUB_CMDLINE_LINUX” part
-rd.driver.blacklist=nouveau nouveau.modeset=0
-
-# Generate configuration
-grub2-mkconfig -o /boot/grub2/grub.cfg
-
-vim /etc/modprobe.d/blacklist.conf
-# Add confiuration:
-blacklist nouveau
-
-mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r)-nouveau.img
-dracut /boot/initramfs-$(uname -r).img $(uname -r)
-reboot
-```
-
-Check whether nouveau is disabled
-
-```
-lsmod | grep nouveau  # return null
-
-# install nvidia driver
-sh NVIDIA-Linux-x86_64-390.87.run
-```
-
-Some options during the installation
-
-```
-Install NVIDIA's 32-bit compatibility libraries (Yes)
-centos Install NVIDIA's 32-bit compatibility libraries (Yes)
-Would you like to run the nvidia-xconfig utility to automatically update your 
X configuration file... (NO)
-```
-
-
-Check nvidia driver installation
-
-```
-nvidia-smi
-```
-
-Reference：
-https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
-
-
-
-### Docker Installation
-
-We recommend to use Docker version >= 1.12.5, following steps are just for 
your reference. You can always to choose other approaches to install Docker.
-
-```
-yum -y update
-yum -y install yum-utils
-yum-config-manager --add-repo https://yum.dockerproject.org/repo/main/centos/7
-yum -y update
-
-# Show available packages
-yum search --showduplicates docker-engine
-
-# Install docker 1.12.5
-yum -y --nogpgcheck install docker-engine-1.12.5*
-systemctl start docker
-
-chown hadoop:netease /var/run/docker.sock
-chown hadoop:netease /usr/bin/docker
-```
-
-Reference：https://docs.docker.com/cs-engine/1.12/
-
-### Docker Configuration
-
-Add a file, named daemon.json, under the path of /etc/docker/. Please replace 
the variables of image_registry_ip, etcd_host_ip, localhost_ip, 
yarn_dns_registry_host_ip, dns_host_ip with specific ips according to your 
environments.
-
-```
-{
-    "insecure-registries": ["${image_registry_ip}:5000"],
-    
"cluster-store":"etcd://${etcd_host_ip1}:2379,${etcd_host_ip2}:2379,${etcd_host_ip3}:2379",
-    "cluster-advertise":"{localhost_ip}:2375",
-    "dns": ["${yarn_dns_registry_host_ip}", "${dns_host_ip1}"],
-    "hosts": ["tcp://{localhost_ip}:2375", "unix:///var/run/docker.sock"]
-}
-```
-
-Restart docker daemon：
-
-```
-sudo systemctl restart docker
-```
-
-
-
-### Docker EE version
-
-```bash
-$ docker version
-
-Client:
- Version:      1.12.5
- API version:  1.24
- Go version:   go1.6.4
- Git commit:   7392c3b
- Built:        Fri Dec 16 02:23:59 2016
- OS/Arch:      linux/amd64
-
-Server:
- Version:      1.12.5
- API version:  1.24
- Go version:   go1.6.4
- Git commit:   7392c3b
- Built:        Fri Dec 16 02:23:59 2016
- OS/Arch:      linux/amd64
-```
-
-### Nvidia-docker Installation (Only for Nvidia GPU equipped nodes)
-
-Submarine depends on nvidia-docker 1.0 version
-
-```
-wget -P /tmp 
https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker-1.0.1-1.x86_64.rpm
-sudo rpm -i /tmp/nvidia-docker*.rpm
-# Start nvidia-docker
-sudo systemctl start nvidia-docker
-
-# Check nvidia-docker status：
-systemctl status nvidia-docker
-
-# Check nvidia-docker log：
-journalctl -u nvidia-docker
-
-# Test nvidia-docker-plugin
-curl http://localhost:3476/v1.0/docker/cli
-```
-
-According to `nvidia-driver` version, add folders under the path of  
`/var/lib/nvidia-docker/volumes/nvidia_driver/`
-
-```
-mkdir /var/lib/nvidia-docker/volumes/nvidia_driver/390.87
-# 390.8 is nvidia driver version
-
-mkdir /var/lib/nvidia-docker/volumes/nvidia_driver/390.87/bin
-mkdir /var/lib/nvidia-docker/volumes/nvidia_driver/390.87/lib64
-
-cp /usr/bin/nvidia* /var/lib/nvidia-docker/volumes/nvidia_driver/390.87/bin
-cp /usr/lib64/libcuda* 
/var/lib/nvidia-docker/volumes/nvidia_driver/390.87/lib64
-cp /usr/lib64/libnvidia* 
/var/lib/nvidia-docker/volumes/nvidia_driver/390.87/lib64
-
-# Test with nvidia-smi
-nvidia-docker run --rm nvidia/cuda:9.0-devel nvidia-smi
-```
-
-Test docker, nvidia-docker, nvidia-driver installation
-
-```
-# Test 1
-nvidia-docker run -rm nvidia/cuda nvidia-smi
-```
-
-```
-# Test 2
-nvidia-docker run -it tensorflow/tensorflow:1.9.0-gpu bash
-# In docker container
-python
-import tensorflow as tf
-tf.test.is_gpu_available()
-```
-
-[The way to uninstall nvidia-docker 
1.0](https://github.com/nvidia/nvidia-docker/wiki/Installation-(version-2.0))
-
-Reference:
-https://github.com/NVIDIA/nvidia-docker/tree/1.0
-
-
-### Tensorflow Image
-
-There is no need to install CUDNN and CUDA on the servers, because CUDNN and 
CUDA can be added in the docker images. we can get basic docker images by 
following WriteDockerfile.md.
-
-
-The basic Dockerfile doesn't support kerberos security. if you need kerberos, 
you can get write a Dockerfile like this
-
-
-```shell
-FROM nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04
-
-# Pick up some TF dependencies
-RUN apt-get update && apt-get install -y --allow-downgrades 
--no-install-recommends \
-        build-essential \
-        cuda-command-line-tools-9-0 \
-        cuda-cublas-9-0 \
-        cuda-cufft-9-0 \
-        cuda-curand-9-0 \
-        cuda-cusolver-9-0 \
-        cuda-cusparse-9-0 \
-        curl \
-        libcudnn7=7.0.5.15-1+cuda9.0 \
-        libfreetype6-dev \
-        libpng12-dev \
-        libzmq3-dev \
-        pkg-config \
-        python \
-        python-dev \
-        rsync \
-        software-properties-common \
-        unzip \
-        && \
-    apt-get clean && \
-    rm -rf /var/lib/apt/lists/*
-
-RUN export DEBIAN_FRONTEND=noninteractive && apt-get update && apt-get install 
-yq krb5-user libpam-krb5 && apt-get clean
-
-RUN curl -O https://bootstrap.pypa.io/get-pip.py && \
-    python get-pip.py && \
-    rm get-pip.py
-
-RUN pip --no-cache-dir install \
-        Pillow \
-        h5py \
-        ipykernel \
-        jupyter \
-        matplotlib \
-        numpy \
-        pandas \
-        scipy \
-        sklearn \
-        && \
-    python -m ipykernel.kernelspec
-
-# Install TensorFlow GPU version.
-RUN pip --no-cache-dir install \
-    
http://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.8.0-cp27-none-linux_x86_64.whl
-RUN apt-get update && apt-get install git -y
-
-RUN apt-get update && apt-get install -y openjdk-8-jdk wget
-# Downloadhadoop-3.1.1.tar.gz
-RUN wget 
http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-3.1.1/hadoop-3.1.1.tar.gz
-RUN tar zxf hadoop-3.1.1.tar.gz
-RUN mv hadoop-3.1.1 hadoop-3.1.0
-
-# Download jdk which supports kerberos
-RUN wget -qO jdk8.tar.gz 
'http://${kerberos_jdk_url}/jdk-8u152-linux-x64.tar.gz'
-RUN tar xzf jdk8.tar.gz -C /opt
-RUN mv /opt/jdk* /opt/java
-RUN rm jdk8.tar.gz
-RUN update-alternatives --install /usr/bin/java java /opt/java/bin/java 100
-RUN update-alternatives --install /usr/bin/javac javac /opt/java/bin/javac 100
-
-ENV JAVA_HOME /opt/java
-ENV PATH $PATH:$JAVA_HOME/bin
-```
-
-
-### Test tensorflow in a docker container
-
-After docker image is built, we can check
-Tensorflow environments before submitting a yarn job.
-
-```shell
-$ docker run -it ${docker_image_name} /bin/bash
-# >>> In the docker container
-$ python
-$ python >> import tensorflow as tf
-$ python >> tf.__version__
-```
-
-If there are some errors, we could check the following configuration.
-
-1. LD_LIBRARY_PATH environment variable
-
-   ```
-   echo $LD_LIBRARY_PATH
-   
/usr/local/cuda/extras/CUPTI/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
-   ```
-
-2. The location of libcuda.so.1, libcuda.so
-
-   ```
-   ls -l /usr/local/nvidia/lib64 | grep libcuda.so
-   ```
-
-
-## Hadoop Installation
-
-### Get Hadoop Release
-You can either get Hadoop release binary or compile from source code. Please 
follow the guides from [Hadoop Homepage](https://hadoop.apache.org/).
-For hadoop cluster setup, please refer to [Hadoop Cluster 
Setup](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html)
-
-
-### Start yarn service
-
-```
-YARN_LOGFILE=resourcemanager.log ./sbin/yarn-daemon.sh start resourcemanager
-YARN_LOGFILE=nodemanager.log ./sbin/yarn-daemon.sh start nodemanager
-YARN_LOGFILE=timeline.log ./sbin/yarn-daemon.sh start timelineserver
-YARN_LOGFILE=mr-historyserver.log ./sbin/mr-jobhistory-daemon.sh start 
historyserver
-```
-
-### Test with a MR wordcount job
-
-```
-./bin/hadoop jar 
/home/hadoop/hadoop-current/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0-SNAPSHOT.jar
 wordcount /tmp/wordcount.txt /tmp/wordcount-output4
-```
-
-## Tensorflow Job with CPU
-
-### Standalone Mode
-
-#### Clean up apps with the same name
-
-Suppose we want to submit a tensorflow job named standalone-tf, destroy any 
application with the same name and clean up historical job directories.
-
-```bash
-./bin/yarn app -destroy standalone-tf
-./bin/hdfs dfs -rmr hdfs://${dfs_name_service}/tmp/cifar-10-jobdir
-```
-where ${dfs_name_service} is the hdfs name service you use
-
-#### Run a standalone tensorflow job
-
-```bash
-./bin/yarn jar 
/home/hadoop/hadoop-current/share/hadoop/yarn/hadoop-yarn-submarine-3.2.0-SNAPSHOT.jar
 job run \
- --env DOCKER_JAVA_HOME=/opt/java \
- --env DOCKER_HADOOP_HDFS_HOME=/hadoop-3.1.0 --name standalone-tf \
- --docker_image dockerfile-cpu-tf1.8.0-with-models \
- --input_path hdfs://${dfs_name_service}/tmp/cifar-10-data \
- --checkpoint_path hdfs://${dfs_name_service}/user/hadoop/tf-checkpoint \
- --worker_resources memory=4G,vcores=2 --verbose \
- --worker_launch_cmd "python /test/cifar10_estimator/cifar10_main.py 
--data-dir=hdfs://${dfs_name_service}/tmp/cifar-10-data 
--job-dir=hdfs://${dfs_name_service}/tmp/cifar-10-jobdir --train-steps=500 
--eval-batch-size=16 --train-batch-size=16 --num-gpus=0"
-```
-
-### Distributed Mode
-
-#### Clean up apps with the same name
-
-```bash
-./bin/yarn app -destroy distributed-tf
-./bin/hdfs dfs -rmr hdfs://${dfs_name_service}/tmp/cifar-10-jobdir
-```
-
-#### Run a distributed tensorflow job
-
-```bash
-./bin/yarn jar 
/home/hadoop/hadoop-current/share/hadoop/yarn/hadoop-yarn-submarine-3.2.0-SNAPSHOT.jar
 job run \
- --env DOCKER_JAVA_HOME=/opt/java \
- --env DOCKER_HADOOP_HDFS_HOME=/hadoop-3.1.0 --name distributed-tf \
- --env YARN_CONTAINER_RUNTIME_DOCKER_CONTAINER_NETWORK=calico-network \
- --docker_image dockerfile-cpu-tf1.8.0-with-models \
- --input_path hdfs://${dfs_name_service}/tmp/cifar-10-data \
- --checkpoint_path 
hdfs://${dfs_name_service}/user/hadoop/tf-distributed-checkpoint \
- --worker_resources memory=4G,vcores=2 --verbose \
- --num_ps 1 \
- --ps_resources memory=4G,vcores=2 \
- --ps_launch_cmd "python /test/cifar10_estimator/cifar10_main.py 
--data-dir=hdfs://${dfs_name_service}/tmp/cifar-10-data 
--job-dir=hdfs://${dfs_name_service}/tmp/cifar-10-jobdir --num-gpus=0" \
- --num_workers 4 \
- --worker_launch_cmd "python /test/cifar10_estimator/cifar10_main.py 
--data-dir=hdfs://${dfs_name_service}/tmp/cifar-10-data 
--job-dir=hdfs://${dfs_name_service}/tmp/cifar-10-jobdir --train-steps=500 
--eval-batch-size=16 --train-batch-size=16 --sync --num-gpus=0"
-```
-
-
-## Tensorflow Job with GPU
-
-### GPU configurations for both resourcemanager and nodemanager
-
-Add the yarn resource configuration file, named resource-types.xml
-
-   ```
-   <configuration>
-     <property>
-       <name>yarn.resource-types</name>
-       <value>yarn.io/gpu</value>
-     </property>
-   </configuration>
-   ```
-
-#### GPU configurations for resourcemanager
-
-The scheduler used by resourcemanager must be  capacity scheduler, and 
yarn.scheduler.capacity.resource-calculator in  capacity-scheduler.xml should 
be DominantResourceCalculator
-
-   ```
-   <configuration>
-     <property>
-       <name>yarn.scheduler.capacity.resource-calculator</name>
-       
<value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
-     </property>
-   </configuration>
-   ```
-
-#### GPU configurations for nodemanager
-
-Add configurations in yarn-site.xml
-
-   ```
-   <configuration>
-     <property>
-       <name>yarn.nodemanager.resource-plugins</name>
-       <value>yarn.io/gpu</value>
-     </property>
-   </configuration>
-   ```
-
-Add configurations in container-executor.cfg
-
-   ```
-   [docker]
-   ...
-   # Add configurations in `[docker]` part：
-   # /usr/bin/nvidia-docker is the path of nvidia-docker command
-   # nvidia_driver_375.26 means that nvidia driver version is <version>. 
nvidia-smi command can be used to check the version
-   docker.allowed.volume-drivers=/usr/bin/nvidia-docker
-   
docker.allowed.devices=/dev/nvidiactl,/dev/nvidia-uvm,/dev/nvidia-uvm-tools,/dev/nvidia1,/dev/nvidia0
-   docker.allowed.ro-mounts=nvidia_driver_<version>
-
-   [gpu]
-   module.enabled=true
-
-   [cgroups]
-   # /sys/fs/cgroup is the cgroup mount destination
-   # /hadoop-yarn is the path yarn creates by default
-   root=/sys/fs/cgroup
-   yarn-hierarchy=/hadoop-yarn
-   ```
-
-## Yarn Service Runtime Requirement (Deprecated)
-
-The function of "yarn native service" is available since hadoop 3.1.0.
-Submarine supports to utilize yarn native service to submit a ML job. However, 
as
-there are several other components required. It is hard to enable
-and maintain the components. So yarn service runtime is deprecated since 
submarine 0.3.0.
-We recommend to use YarnRuntime instead. If you still want to enable it, please
-follow these steps.
-
-### Etcd Installation
-
-etcd is a distributed reliable key-value store for the most critical data of a 
distributed system, Registration and discovery of services used in containers.
-You can also choose alternatives like zookeeper, Consul.
-
-To install Etcd on specified servers, we can run Submarine-installer/install.sh
-
-```shell
-$ ./Submarine-installer/install.sh
-# Etcd status
-systemctl status Etcd.service
-```
-
-Check Etcd cluster health
-
-```shell
-$ etcdctl cluster-health
-member 3adf2673436aa824 is healthy: got healthy result from 
http://${etcd_host_ip1}:2379
-member 85ffe9aafb7745cc is healthy: got healthy result from 
http://${etcd_host_ip2}:2379
-member b3d05464c356441a is healthy: got healthy result from 
http://${etcd_host_ip3}:2379
-cluster is healthy
-
-$ etcdctl member list
-3adf2673436aa824: name=etcdnode3 peerURLs=http://${etcd_host_ip1}:2380 
clientURLs=http://${etcd_host_ip1}:2379 isLeader=false
-85ffe9aafb7745cc: name=etcdnode2 peerURLs=http://${etcd_host_ip2}:2380 
clientURLs=http://${etcd_host_ip2}:2379 isLeader=false
-b3d05464c356441a: name=etcdnode1 peerURLs=http://${etcd_host_ip3}:2380 
clientURLs=http://${etcd_host_ip3}:2379 isLeader=true
-```
-
-### Calico Installation
-
-Calico creates and manages a flat three-tier network, and each container is 
assigned a routable ip. We just add the steps here for your convenience.
-You can also choose alternatives like Flannel, OVS.
-
-To install Calico on specified servers, we can run 
Submarine-installer/install.sh
-
-```
-systemctl start calico-node.service
-systemctl status calico-node.service
-```
-
-#### Check Calico Network
-
-```shell
-# Run the following command to show the all host status in the cluster except 
localhost.
-$ calicoctl node status
-Calico process is running.
-
-IPv4 BGP status
-+---------------+-------------------+-------+------------+-------------+
-| PEER ADDRESS  |     PEER TYPE     | STATE |   SINCE    |    INFO     |
-+---------------+-------------------+-------+------------+-------------+
-| ${host_ip1} | node-to-node mesh | up    | 2018-09-21 | Established |
-| ${host_ip2} | node-to-node mesh | up    | 2018-09-21 | Established |
-| ${host_ip3} | node-to-node mesh | up    | 2018-09-21 | Established |
-+---------------+-------------------+-------+------------+-------------+
-
-IPv6 BGP status
-No IPv6 peers found.
-```
-
-Create containers to validate calico network
-
-```
-docker network create --driver calico --ipam-driver calico-ipam calico-network
-docker run --net calico-network --name workload-A -tid busybox
-docker run --net calico-network --name workload-B -tid busybox
-docker exec workload-A ping workload-B
-```
-
-### Enable calico network for docker container
-Set yarn-site.xml to use bridge for docker container
-
-```
-<property>
-    
<name>yarn.nodemanager.runtime.linux.docker.default-container-network</name>
-    <value>calico-network</value>
-  </property>
-  <property>
-    <name>yarn.nodemanager.runtime.linux.allowed-runtimes</name>
-    <value>default,docker</value>
-  </property>
-  <property>
-    
<name>yarn.nodemanager.runtime.linux.docker.allowed-container-networks</name>
-    <value>host,none,bridge,calico-network</value>
-  </property>
-```
-
-Add calico-network to container-executor.cfg
-
-```
-docker.allowed.networks=bridge,host,none,calico-network
-```
-
-Then restart all nodemanagers.
-
-### Start yarn registery dns service
-
-Yarn registry nds server exposes existing service-discovery information via DNS
-and enables docker containers to IP mappings. By using it, the containers of a 
-ML job knows how to communicate with each other.
-
-Please specify a server to start yarn registery dns service. For details please
-refer to [Registry DNS 
Server](http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn-service/RegistryDNS.html)
-
-```
-sudo YARN_LOGFILE=registrydns.log ./yarn-daemon.sh start registrydns
-```
-
-### Run submarine job
-
-Set submarine.runtime.class to YarnServiceRuntimeFactory in submarine-site.xml.
-```
-<property>
-    <name>submarine.runtime.class</name>
-    
<value>org.apache.submarine.server.submitter.yarnservice.YarnServiceRuntimeFactory</value>
-    <description>RuntimeFactory for Submarine jobs</description>
-  </property>
-```
-
-#### Standalone Mode
-
-Suppose we want to submit a tensorflow job named standalone-tf, destroy any 
application with the same name and clean up historical job directories.
-
-```bash
-./bin/yarn app -destroy standalone-tf
-./bin/hdfs dfs -rmr hdfs://${dfs_name_service}/tmp/cifar-10-jobdir
-```
-where ${dfs_name_service} is the hdfs name service you use.
-
-Run a standalone tensorflow job
-
-```
-CLASSPATH=`${HADOOP_HOME}/bin/hadoop classpath 
--glob`:${SUBMARINE_HOME}/submarine-all-${SUBMARINE_VERSION}.jar:
-${SUBMARINE_HOME}/conf: \
-java org.apache.submarine.client.cli.Cli job run \
- --env DOCKER_JAVA_HOME=/opt/java \
- --env YARN_CONTAINER_RUNTIME_DOCKER_CONTAINER_NETWORK=calico-network \
- --env DOCKER_HADOOP_HDFS_HOME=/hadoop-3.1.0 --name standalone-tf \
- --docker_image dockerfile-cpu-tf1.8.0-with-models \
- --input_path hdfs://${dfs_name_service}/tmp/cifar-10-data \
- --checkpoint_path hdfs://${dfs_name_service}/user/hadoop/tf-checkpoint \
- --worker_resources memory=4G,vcores=2 --verbose \
- --worker_launch_cmd "python /test/cifar10_estimator/cifar10_main.py 
--data-dir=hdfs://${dfs_name_service}/tmp/cifar-10-data 
--job-dir=hdfs://${dfs_name_service}/tmp/cifar-10-jobdir --train-steps=500 
--eval-batch-size=16 --train-batch-size=16 --num-gpus=0"
-```
-
-#### Distributed Mode
-
-Clean up apps with the same name
-
-```bash
-./bin/yarn app -destroy distributed-tf
-./bin/hdfs dfs -rmr hdfs://${dfs_name_service}/tmp/cifar-10-jobdir
-```
-
-Run a distributed tensorflow job
-
-```
-CLASSPATH=`${HADOOP_HOME}/bin/hadoop classpath 
--glob`:${SUBMARINE_HOME}/submarine-all-${SUBMARINE_VERSION}.jar:
-${SUBMARINE_HOME}/conf: \
-java org.apache.submarine.client.cli.Cli job run \
- --env DOCKER_JAVA_HOME=/opt/java \
- --env YARN_CONTAINER_RUNTIME_DOCKER_CONTAINER_NETWORK=calico-network \
- --env DOCKER_HADOOP_HDFS_HOME=/hadoop-3.1.0 --name distributed-tf \
- --env YARN_CONTAINER_RUNTIME_DOCKER_CONTAINER_NETWORK=calico-network \
- --docker_image dockerfile-cpu-tf1.8.0-with-models \
- --input_path hdfs://${dfs_name_service}/tmp/cifar-10-data \
- --checkpoint_path 
hdfs://${dfs_name_service}/user/hadoop/tf-distributed-checkpoint \
- --worker_resources memory=4G,vcores=2 --verbose \
- --num_ps 1 \
- --ps_resources memory=4G,vcores=2 \
- --ps_launch_cmd "python /test/cifar10_estimator/cifar10_main.py 
--data-dir=hdfs://${dfs_name_service}/tmp/cifar-10-data 
--job-dir=hdfs://${dfs_name_service}/tmp/cifar-10-jobdir --num-gpus=0" \
- --num_workers 4 \
- --worker_launch_cmd "python /test/cifar10_estimator/cifar10_main.py 
--data-dir=hdfs://${dfs_name_service}/tmp/cifar-10-data 
--job-dir=hdfs://${dfs_name_service}/tmp/cifar-10-jobdir --train-steps=500 
--eval-batch-size=16 --train-batch-size=16 --sync --num-gpus=0"
-```
diff --git a/dev-support/submarine-installer/InstallationGuideChineseVersion.md 
b/dev-support/submarine-installer/InstallationGuideChineseVersion.md
deleted file mode 100644
index 5339a74..0000000
--- a/dev-support/submarine-installer/InstallationGuideChineseVersion.md
+++ /dev/null
@@ -1,845 +0,0 @@
-# Submarine 安装说明
-
-## Prerequisites
-
-### 操作系统
-
-我们使用的操作系统版本是 centos-release-7-3.1611.el7.centos.x86_64, 内核版本是 
3.10.0-514.el7.x86_64 ，应该是最低版本了。
-
-| Enviroment | Verion |
-| ------ | ------ |
-| Operating System | centos-release-7-3.1611.el7.centos.x86_64 |
-| Kernal | 3.10.0-514.el7.x86_64 |
-
-### User & Group
-
-如果操作系统中没有这些用户组和用户，必须添加。一部分用户是 hadoop 运行需要，一部分用户是 docker 运行需要。
-
-```
-adduser hdfs
-adduser mapred
-adduser yarn
-addgroup hadoop
-usermod -aG hdfs,hadoop hdfs
-usermod -aG mapred,hadoop mapred
-usermod -aG yarn,hadoop yarn
-usermod -aG hdfs,hadoop hadoop
-groupadd docker
-usermod -aG docker yarn
-usermod -aG docker hadoop
-```
-
-### GCC 版本
-
-```bash
-gcc --version
-gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11)
-# 如果没有安装请执行以下命令进行安装
-yum install gcc make g++
-```
-
-### Kernel header & devel
-
-```bash
-# 方法一：
-yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r)
-# 方法二：
-wget 
http://vault.centos.org/7.3.1611/os/x86_64/Packages/kernel-headers-3.10.0-514.el7.x86_64.rpm
-rpm -ivh kernel-headers-3.10.0-514.el7.x86_64.rpm
-```
-
-### 检查 GPU 版本
-
-```
-lspci | grep -i nvidia
-
-# 如果什么都没输出，就说明显卡不对，以下是我的输出：
-# 04:00.0 3D controller: NVIDIA Corporation Device 1b38 (rev a1)
-# 82:00.0 3D controller: NVIDIA Corporation Device 1b38 (rev a1)
-```
-
-
-
-### 安装 nvidia 驱动
-
-安装nvidia driver/cuda要确保已安装的nvidia driver/cuda已被清理
-
-```
-# 卸载cuda：
-sudo /usr/local/cuda-10.0/bin/uninstall_cuda_10.0.pl
-
-# 卸载nvidia-driver：
-sudo /usr/bin/nvidia-uninstall
-```
-
-安装 nvidia-detect，用于检查显卡版本
-
-```
-yum install nvidia-detect
-# 运行命令 nvidia-detect -v 返回结果：
-nvidia-detect -v
-Probing for supported NVIDIA devices...
-[10de:13bb] NVIDIA Corporation GM107GL [Quadro K620]
-This device requires the current 390.87 NVIDIA driver kmod-nvidia
-[8086:1912] Intel Corporation HD Graphics 530
-An Intel display controller was also detected
-```
-
-注意这里的信息 [Quadro K620] 和 390.87。
-下载 
[NVIDIA-Linux-x86_64-390.87.run](https://www.nvidia.com/object/linux-amd64-display-archive.html)
-
-
-安装前的一系列准备工作
-
-```
-# 若系统很久没更新，这句可能耗时较长
-yum -y update
-yum -y install kernel-devel
-
-yum -y install epel-release
-yum -y install dkms
-
-# 禁用nouveau
-vim /etc/default/grub  #在“GRUB_CMDLINE_LINUX”中添加内容 rd.driver.blacklist=nouveau 
nouveau.modeset=0
-grub2-mkconfig -o /boot/grub2/grub.cfg # 生成配置
-vim /etc/modprobe.d/blacklist.conf # 打开（新建）文件，添加内容blacklist nouveau
-
-mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r)-nouveau.img
-dracut /boot/initramfs-$(uname -r).img $(uname -r)   # 更新配置，并重启
-reboot
-```
-
-开机后确认是否禁用
-
-```
-lsmod | grep nouveau  # 应该返回空
-
-# 开始安装
-sh NVIDIA-Linux-x86_64-390.87.run
-```
-
-安装过程中，会遇到一些选项：
-
-```
-Install NVIDIA's 32-bit compatibility libraries (Yes)
-centos Install NVIDIA's 32-bit compatibility libraries (Yes)
-Would you like to run the nvidia-xconfig utility to automatically update your 
X configuration file... (NO)
-```
-
-
-最后查看 nvidia gpu 状态
-
-```
-nvidia-smi
-```
-
-reference：
-https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
-
-
-
-### 安装 Docker
-
-```
-yum -y update
-yum -y install yum-utils
-yum-config-manager --add-repo https://yum.dockerproject.org/repo/main/centos/7
-yum -y update
-
-# 显示 available 的安装包
-yum search --showduplicates docker-engine
-
-# 安装 1.12.5 版本 docker
-yum -y --nogpgcheck install docker-engine-1.12.5*
-systemctl start docker
-
-chown hadoop:netease /var/run/docker.sock
-chown hadoop:netease /usr/bin/docker
-```
-
-Reference：https://docs.docker.com/cs-engine/1.12/
-
-### 配置 Docker
-
-在 `/etc/docker/` 目录下，创建`daemon.json`文件, 添加以下配置，变量如image_registry_ip, 
etcd_host_ip, localhost_ip, yarn_dns_registry_host_ip, dns_host_ip需要根据具体环境，进行修改
-
-```
-{
-    "insecure-registries": ["${image_registry_ip}:5000"],
-    
"cluster-store":"etcd://${etcd_host_ip1}:2379,${etcd_host_ip2}:2379,${etcd_host_ip3}:2379",
-    "cluster-advertise":"{localhost_ip}:2375",
-    "dns": ["${yarn_dns_registry_host_ip}", "${dns_host_ip1}"],
-    "hosts": ["tcp://{localhost_ip}:2375", "unix:///var/run/docker.sock"]
-}
-```
-
-重启 docker daemon：
-
-```
-sudo systemctl restart docker
-```
-
-
-
-### Docker EE version
-
-```bash
-$ docker version
-
-Client:
- Version:      1.12.5
- API version:  1.24
- Go version:   go1.6.4
- Git commit:   7392c3b
- Built:        Fri Dec 16 02:23:59 2016
- OS/Arch:      linux/amd64
-
-Server:
- Version:      1.12.5
- API version:  1.24
- Go version:   go1.6.4
- Git commit:   7392c3b
- Built:        Fri Dec 16 02:23:59 2016
- OS/Arch:      linux/amd64
-```
-
-### 安装 nvidia-docker
-
-Hadoop-3.2 的 submarine 使用的是 1.0 版本的 nvidia-docker
-
-```
-wget -P /tmp 
https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker-1.0.1-1.x86_64.rpm
-sudo rpm -i /tmp/nvidia-docker*.rpm
-# 启动 nvidia-docker
-sudo systemctl start nvidia-docker
-
-# 查看 nvidia-docker 状态：
-systemctl status nvidia-docker
-
-# 查看 nvidia-docker 日志：
-journalctl -u nvidia-docker
-
-# 查看 nvidia-docker-plugin 是否正常
-curl http://localhost:3476/v1.0/docker/cli
-```
-
-在 `/var/lib/nvidia-docker/volumes/nvidia_driver/` 路径下，根据 `nvidia-driver` 
的版本创建文件夹：
-
-```
-mkdir /var/lib/nvidia-docker/volumes/nvidia_driver/390.87
-# 其中390.87是nvidia driver的版本号
-
-mkdir /var/lib/nvidia-docker/volumes/nvidia_driver/390.87/bin
-mkdir /var/lib/nvidia-docker/volumes/nvidia_driver/390.87/lib64
-
-cp /usr/bin/nvidia* /var/lib/nvidia-docker/volumes/nvidia_driver/390.87/bin
-cp /usr/lib64/libcuda* 
/var/lib/nvidia-docker/volumes/nvidia_driver/390.87/lib64
-cp /usr/lib64/libnvidia* 
/var/lib/nvidia-docker/volumes/nvidia_driver/390.87/lib64
-
-# Test nvidia-smi
-nvidia-docker run --rm nvidia/cuda:9.0-devel nvidia-smi
-```
-
-测试 docker, nvidia-docker, nvidia-driver 安装
-
-```
-# 测试一
-nvidia-docker run -rm nvidia/cuda nvidia-smi
-```
-
-```
-# 测试二
-nvidia-docker run -it tensorflow/tensorflow:1.9.0-gpu bash
-# 在docker中执行
-python
-import tensorflow as tf
-tf.test.is_gpu_available()
-```
-
-卸载 nvidia-docker 1.0 的方法：
-https://github.com/nvidia/nvidia-docker/wiki/Installation-(version-2.0)
-
-reference:
-https://github.com/NVIDIA/nvidia-docker/tree/1.0
-
-
-
-### Tensorflow Image
-
-CUDNN 和 CUDA 其实不需要在物理机上安装，因为 Sumbmarine 中提供了已经包含了 CUDNN 和 CUDA 的镜像文件，基础的 
Dockfile 可参见 WriteDockerfile.md
-
-
-上述 images 无法支持 kerberos 环境，如果需要 kerberos 可以使用如下 Dockfile
-
-```shell
-FROM nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04
-
-# Pick up some TF dependencies
-RUN apt-get update && apt-get install -y --allow-downgrades 
--no-install-recommends \
-        build-essential \
-        cuda-command-line-tools-9-0 \
-        cuda-cublas-9-0 \
-        cuda-cufft-9-0 \
-        cuda-curand-9-0 \
-        cuda-cusolver-9-0 \
-        cuda-cusparse-9-0 \
-        curl \
-        libcudnn7=7.0.5.15-1+cuda9.0 \
-        libfreetype6-dev \
-        libpng12-dev \
-        libzmq3-dev \
-        pkg-config \
-        python \
-        python-dev \
-        rsync \
-        software-properties-common \
-        unzip \
-        && \
-    apt-get clean && \
-    rm -rf /var/lib/apt/lists/*
-
-RUN export DEBIAN_FRONTEND=noninteractive && apt-get update && apt-get install 
-yq krb5-user libpam-krb5 && apt-get clean
-
-RUN curl -O https://bootstrap.pypa.io/get-pip.py && \
-    python get-pip.py && \
-    rm get-pip.py
-
-RUN pip --no-cache-dir install \
-        Pillow \
-        h5py \
-        ipykernel \
-        jupyter \
-        matplotlib \
-        numpy \
-        pandas \
-        scipy \
-        sklearn \
-        && \
-    python -m ipykernel.kernelspec
-
-# Install TensorFlow GPU version.
-RUN pip --no-cache-dir install \
-    
http://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.8.0-cp27-none-linux_x86_64.whl
-RUN apt-get update && apt-get install git -y
-
-RUN apt-get update && apt-get install -y openjdk-8-jdk wget
-# 下载 hadoop-3.1.1.tar.gz
-RUN wget 
http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-3.1.1/hadoop-3.1.1.tar.gz
-RUN tar zxf hadoop-3.1.1.tar.gz
-RUN mv hadoop-3.1.1 hadoop-3.1.0
-
-# 下载支持kerberos的jdk安装包
-RUN wget -qO jdk8.tar.gz 
'http://${kerberos_jdk_url}/jdk-8u152-linux-x64.tar.gz'
-RUN tar xzf jdk8.tar.gz -C /opt
-RUN mv /opt/jdk* /opt/java
-RUN rm jdk8.tar.gz
-RUN update-alternatives --install /usr/bin/java java /opt/java/bin/java 100
-RUN update-alternatives --install /usr/bin/javac javac /opt/java/bin/javac 100
-
-ENV JAVA_HOME /opt/java
-ENV PATH $PATH:$JAVA_HOME/bin
-```
-
-
-### 测试 TF 环境
-
-创建好 docker 镜像后，需要先手动检查 TensorFlow 是否可以正常使用，避免通过 YARN 调度后出现问题，可以执行以下命令
-
-```shell
-$ docker run -it ${docker_image_name} /bin/bash
-# >>> 进入容器
-$ python
-$ python >> import tensorflow as tf
-$ python >> tf.__version__
-```
-
-如果出现问题，可以按照以下路径进行排查
-
-1. 环境变量是否设置正确
-
-   ```
-   echo $LD_LIBRARY_PATH
-   
/usr/local/cuda/extras/CUPTI/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
-   ```
-
-2. libcuda.so.1,libcuda.so是否在LD_LIBRARY_PATH指定的路径中
-
-   ```
-   ls -l /usr/local/nvidia/lib64 | grep libcuda.so
-   ```
-
-## 安装 Hadoop
-
-### 安装 Hadoop
-首先，我们通过源码编译或者直接从官网 [Hadoop Homepage](https://hadoop.apache.org/)下载获取 hadoop 包。
-然后，请参考 [Hadoop Cluster 
Setup](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html)
-进行 Hadoop 集群安装。
-
-
-
-### 启动 YARN 服务
-
-```
-YARN_LOGFILE=resourcemanager.log ./sbin/yarn-daemon.sh start resourcemanager
-YARN_LOGFILE=nodemanager.log ./sbin/yarn-daemon.sh start nodemanager
-YARN_LOGFILE=timeline.log ./sbin/yarn-daemon.sh start timelineserver
-YARN_LOGFILE=mr-historyserver.log ./sbin/mr-jobhistory-daemon.sh start 
historyserver
-```
-
-
-### 测试 wordcount
-
-通过测试最简单的 wordcount ，检查 YARN 是否正确安装
-
-```
-./bin/hadoop jar 
/home/hadoop/hadoop-current/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0-SNAPSHOT.jar
 wordcount /tmp/wordcount.txt /tmp/wordcount-output4
-```
-
-
-
-## 使用CUP的Tensorflow任务
-
-### 单机模式
-
-#### 清理重名程序
-
-```bash
-# 每次提交前需要执行：
-./bin/yarn app -destroy standalone-tf
-# 并删除hdfs路径：
-./bin/hdfs dfs -rmr hdfs://${dfs_name_service}/tmp/cifar-10-jobdir
-# 确保之前的任务已经结束
-```
-其中，变量${dfs_name_service}请根据环境，用你的hdfs name service名称替换
-
-#### 执行单机模式的tensorflow任务
-
-```bash
-./bin/yarn jar 
/home/hadoop/hadoop-current/share/hadoop/yarn/hadoop-yarn-submarine-3.2.0-SNAPSHOT.jar
 job run \
- --env DOCKER_JAVA_HOME=/opt/java \
- --env DOCKER_HADOOP_HDFS_HOME=/hadoop-3.1.0 --name standalone-tf \
- --docker_image dockerfile-cpu-tf1.8.0-with-models \
- --input_path hdfs://${dfs_name_service}/tmp/cifar-10-data \
- --checkpoint_path hdfs://${dfs_name_service}/user/hadoop/tf-checkpoint \
- --worker_resources memory=4G,vcores=2 --verbose \
- --worker_launch_cmd "python /test/cifar10_estimator/cifar10_main.py 
--data-dir=hdfs://${dfs_name_service}/tmp/cifar-10-data 
--job-dir=hdfs://${dfs_name_service}/tmp/cifar-10-jobdir --train-steps=500 
--eval-batch-size=16 --train-batch-size=16 --num-gpus=0"
-```
-
-
-### 分布式模式
-
-#### 清理重名程序
-
-```bash
-# 每次提交前需要执行：
-./bin/yarn app -destroy distributed-tf
-# 并删除hdfs路径：
-./bin/hdfs dfs -rmr hdfs://${dfs_name_service}/tmp/cifar-10-jobdir
-# 确保之前的任务已经结束
-```
-
-#### 提交分布式模式 tensorflow 任务
-
-```bash
-./bin/yarn jar 
/home/hadoop/hadoop-current/share/hadoop/yarn/hadoop-yarn-submarine-3.2.0-SNAPSHOT.jar
 job run \
- --env DOCKER_JAVA_HOME=/opt/java \
- --env DOCKER_HADOOP_HDFS_HOME=/hadoop-3.1.0 --name distributed-tf \
- --env YARN_CONTAINER_RUNTIME_DOCKER_CONTAINER_NETWORK=calico-network \
- --docker_image dockerfile-cpu-tf1.8.0-with-models \
- --input_path hdfs://${dfs_name_service}/tmp/cifar-10-data \
- --checkpoint_path 
hdfs://${dfs_name_service}/user/hadoop/tf-distributed-checkpoint \
- --worker_resources memory=4G,vcores=2 --verbose \
- --num_ps 1 \
- --ps_resources memory=4G,vcores=2 \
- --ps_launch_cmd "python /test/cifar10_estimator/cifar10_main.py 
--data-dir=hdfs://${dfs_name_service}/tmp/cifar-10-data 
--job-dir=hdfs://${dfs_name_service}/tmp/cifar-10-jobdir --num-gpus=0" \
- --num_workers 4 \
- --worker_launch_cmd "python /test/cifar10_estimator/cifar10_main.py 
--data-dir=hdfs://${dfs_name_service}/tmp/cifar-10-data 
--job-dir=hdfs://${${dfs_name_service}}/tmp/cifar-10-jobdir --train-steps=500 
--eval-batch-size=16 --train-batch-size=16 --sync --num-gpus=0"
-```
-
-
-## 使用GPU的Tensorflow任务
-
-### Resourcemanager, Nodemanager 中添加GPU支持
-
-在 yarn 配置文件夹(conf或etc/hadoop)中创建 resource-types.xml，添加：
-
-   ```
-   <configuration>
-     <property>
-       <name>yarn.resource-types</name>
-       <value>yarn.io/gpu</value>
-     </property>
-   </configuration>
-   ```
-
-### Resourcemanager 的 GPU 配置
-
-resourcemanager 使用的 scheduler 必须是 capacity scheduler，在 capacity-scheduler.xml 
中修改属性：
-
-   ```
-   <configuration>
-     <property>
-       <name>yarn.scheduler.capacity.resource-calculator</name>
-       
<value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
-     </property>
-   </configuration>
-   ```
-
-### Nodemanager 的 GPU 配置
-
-在 nodemanager 的 yarn-site.xml 中添加配置：
-
-   ```
-   <configuration>
-     <property>
-       <name>yarn.nodemanager.resource-plugins</name>
-       <value>yarn.io/gpu</value>
-     </property>
-   </configuration>
-   ```
-
-在 container-executor.cfg 中添加配置：
-
-   ```
-   [docker]
-   ...
-   # 在[docker]已有配置中，添加以下内容：
-   # /usr/bin/nvidia-docker是nvidia-docker路径
-   # nvidia_driver_375.26的版本号375.26，可以使用nvidia-smi查看
-   docker.allowed.volume-drivers=/usr/bin/nvidia-docker
-   
docker.allowed.devices=/dev/nvidiactl,/dev/nvidia-uvm,/dev/nvidia-uvm-tools,/dev/nvidia1,/dev/nvidia0
-   docker.allowed.ro-mounts=nvidia_driver_375.26
-
-   [gpu]
-   module.enabled=true
-
-   [cgroups]
-   # /sys/fs/cgroup是cgroup的mount路径
-   # /hadoop-yarn是yarn在cgroup路径下默认创建的path
-   root=/sys/fs/cgroup
-   yarn-hierarchy=/hadoop-yarn
-   ```
-
-### 提交验证
-
-Distributed-shell + GPU + cgroup
-
-```bash
- ./yarn jar 
/home/hadoop/hadoop-current/share/hadoop/yarn/hadoop-yarn-submarine-3.2.0-SNAPSHOT.jar
 job run \
- --env DOCKER_JAVA_HOME=/opt/java \
- --env DOCKER_HADOOP_HDFS_HOME=/hadoop-3.1.0 --name distributed-tf-gpu \
- --env YARN_CONTAINER_RUNTIME_DOCKER_CONTAINER_NETWORK=calico-network \
- --docker_image gpu-cuda9.0-tf1.8.0-with-models \
- --input_path hdfs://${dfs_name_service}/tmp/cifar-10-data \
- --checkpoint_path 
hdfs://${dfs_name_service}/user/hadoop/tf-distributed-checkpoint \
- --num_ps 0 \
- --ps_resources memory=4G,vcores=2,gpu=0 \
- --ps_launch_cmd "python /test/cifar10_estimator/cifar10_main.py 
--data-dir=hdfs://${dfs_name_service}/tmp/cifar-10-data 
--job-dir=hdfs://${dfs_name_service}/tmp/cifar-10-jobdir --num-gpus=0" \
- --worker_resources memory=4G,vcores=2,gpu=1 --verbose \
- --num_workers 1 \
- --worker_launch_cmd "python /test/cifar10_estimator/cifar10_main.py 
--data-dir=hdfs://${dfs_name_service}/tmp/cifar-10-data 
--job-dir=hdfs://${dfs_name_service}/tmp/cifar-10-jobdir --train-steps=500 
--eval-batch-size=16 --train-batch-size=16 --sync --num-gpus=1"
-```
-
-## Yarn Service Runtime (不推荐)
-
-hadoop 3.1.0 提供了 yarn native service 功能，Submarine 可以利用 yarn native service 
提交分布式机器学习任务。
-但是，由于使用 yarn native service 会引入一些额外的组件，导致部署和运维服务比较困难，因而在 Submarine 0.3.0之后 
Yarn Server Runtime 不再推荐使用。我们建议直接使用
-YarnRuntime，这样可以在 yarn 2.9 上提交机器学习任务。
-开启 Yarn Service Runtime，可以参照下面的方法
-
-### 安装 Etcd
-
-运行 Submarine/install.sh 脚本，就可以在指定服务器中安装 Etcd 组件和服务自启动脚本。
-
-```shell
-$ ./Submarine/install.sh
-# 通过如下命令查看 Etcd 服务状态
-systemctl status Etcd.service
-```
-
-检查 Etcd 服务状态
-
-```shell
-$ etcdctl cluster-health
-member 3adf2673436aa824 is healthy: got healthy result from 
http://${etcd_host_ip1}:2379
-member 85ffe9aafb7745cc is healthy: got healthy result from 
http://${etcd_host_ip2}:2379
-member b3d05464c356441a is healthy: got healthy result from 
http://${etcd_host_ip3}:2379
-cluster is healthy
-
-$ etcdctl member list
-3adf2673436aa824: name=etcdnode3 peerURLs=http://${etcd_host_ip1}:2380 
clientURLs=http://${etcd_host_ip1}:2379 isLeader=false
-85ffe9aafb7745cc: name=etcdnode2 peerURLs=http://${etcd_host_ip2}:2380 
clientURLs=http://${etcd_host_ip2}:2379 isLeader=false
-b3d05464c356441a: name=etcdnode1 peerURLs=http://${etcd_host_ip3}:2380 
clientURLs=http://${etcd_host_ip3}:2379 isLeader=true
-```
-其中，${etcd_host_ip*} 是 etcd 服务器的 ip
-
-
-### 安装 Calico
-
-运行 Submarine/install.sh 脚本，就可以在指定服务器中安装 Calico 组件和服务自启动脚本。
-
-```
-systemctl start calico-node.service
-systemctl status calico-node.service
-```
-
-#### 检查 Calico 网络
-
-```shell
-# 执行如下命令，注意：不会显示本服务器的状态，只显示其他的服务器状态
-$ calicoctl node status
-Calico process is running.
-
-IPv4 BGP status
-+---------------+-------------------+-------+------------+-------------+
-| PEER ADDRESS  |     PEER TYPE     | STATE |   SINCE    |    INFO     |
-+---------------+-------------------+-------+------------+-------------+
-| ${host_ip1} | node-to-node mesh | up    | 2018-09-21 | Established |
-| ${host_ip2} | node-to-node mesh | up    | 2018-09-21 | Established |
-| ${host_ip3} | node-to-node mesh | up    | 2018-09-21 | Established |
-+---------------+-------------------+-------+------------+-------------+
-
-IPv6 BGP status
-No IPv6 peers found.
-```
-
-创建 docker container，验证 calico 网络
-
-```
-docker network create --driver calico --ipam-driver calico-ipam calico-network
-docker run --net calico-network --name workload-A -tid busybox
-docker run --net calico-network --name workload-B -tid busybox
-docker exec workload-A ping workload-B
-```
-
-### Yarn Docker container开启Calico网络
-在配置文件 yarn-site.xml，为 docker container 设置 Calico 网络。
-
-```
-<property>
-    
<name>yarn.nodemanager.runtime.linux.docker.default-container-network</name>
-    <value>calico-network</value>
-  </property>
-  <property>
-    <name>yarn.nodemanager.runtime.linux.allowed-runtimes</name>
-    <value>default,docker</value>
-  </property>
-  <property>
-    
<name>yarn.nodemanager.runtime.linux.docker.allowed-container-networks</name>
-    <value>host,none,bridge,calico-network</value>
-  </property>
-```
-
-在配置文件 container-executor.cfg 中，添加 bridge 网络
-
-```
-docker.allowed.networks=bridge,host,none,calico-network
-```
-
-重启所有的 nodemanager 节点.
-
-
-### 启动 registery dns 服务
-
-Yarn registry nds server 是为服务发现功能而实现的DNS服务。yarn docker container 通过向 registry 
nds server 注册，对外暴露 container 域名与 container IP/port 的映射关系。
-
-Yarn registery dns 的详细配置信息和部署方式，可以参考 [Registry DNS 
Server](http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn-service/RegistryDNS.html)
-
-启动 registry nds 命令
-```
-sudo YARN_LOGFILE=registrydns.log ./yarn-daemon.sh start registrydns
-```
-
-### 运行 submarine 任务
-
-在配置文件 submarine-site.xml 中设置 submarine.runtime.class
-```
-<property>
-    <name>submarine.runtime.class</name>
-    
<value>org.apache.submarine.server.submitter.yarnservice.YarnServiceRuntimeFactory</value>
-    <description>RuntimeFactory for Submarine jobs</description>
-  </property>
-```
-
-#### 单机模式
-
-清理重名任务
-
-```bash
-./bin/yarn app -destroy standalone-tf
-./bin/hdfs dfs -rmr hdfs://${dfs_name_service}/tmp/cifar-10-jobdir
-```
-其中，变量 ${dfs_name_service} 请根据环境，用你的 hdfs name service 名称替换
-
-执行单机模式的 tensorflow 任务
-
-```
-CLASSPATH=`${HADOOP_HOME}/bin/hadoop classpath 
--glob`:${SUBMARINE_HOME}/submarine-all-${SUBMARINE_VERSION}.jar:
-${SUBMARINE_HOME}/conf: \
-java org.apache.submarine.client.cli.Cli job run \
- --env DOCKER_JAVA_HOME=/opt/java \
- --env YARN_CONTAINER_RUNTIME_DOCKER_CONTAINER_NETWORK=calico-network \
- --env DOCKER_HADOOP_HDFS_HOME=/hadoop-3.1.0 --name standalone-tf \
- --docker_image dockerfile-cpu-tf1.8.0-with-models \
- --input_path hdfs://${dfs_name_service}/tmp/cifar-10-data \
- --checkpoint_path hdfs://${dfs_name_service}/user/hadoop/tf-checkpoint \
- --worker_resources memory=4G,vcores=2 --verbose \
- --worker_launch_cmd "python /test/cifar10_estimator/cifar10_main.py 
--data-dir=hdfs://${dfs_name_service}/tmp/cifar-10-data 
--job-dir=hdfs://${dfs_name_service}/tmp/cifar-10-jobdir --train-steps=500 
--eval-batch-size=16 --train-batch-size=16 --num-gpus=0"
-```
-
-#### 分布式模式
-
-清理重名任务
-
-```bash
-./bin/yarn app -destroy distributed-tf
-./bin/hdfs dfs -rmr hdfs://${dfs_name_service}/tmp/cifar-10-jobdir
-```
-
-提交分布式模式 tensorflow 任务
-
-```
-CLASSPATH=`${HADOOP_HOME}/bin/hadoop classpath 
--glob`:${SUBMARINE_HOME}/submarine-all-${SUBMARINE_VERSION}.jar:
-${SUBMARINE_HOME}/conf: \
-java org.apache.submarine.client.cli.Cli job run \
- --env DOCKER_JAVA_HOME=/opt/java \
- --env YARN_CONTAINER_RUNTIME_DOCKER_CONTAINER_NETWORK=calico-network \
- --env DOCKER_HADOOP_HDFS_HOME=/hadoop-3.1.0 --name distributed-tf \
- --env YARN_CONTAINER_RUNTIME_DOCKER_CONTAINER_NETWORK=calico-network \
- --docker_image dockerfile-cpu-tf1.8.0-with-models \
- --input_path hdfs://${dfs_name_service}/tmp/cifar-10-data \
- --checkpoint_path 
hdfs://${dfs_name_service}/user/hadoop/tf-distributed-checkpoint \
- --worker_resources memory=4G,vcores=2 --verbose \
- --num_ps 1 \
- --ps_resources memory=4G,vcores=2 \
- --ps_launch_cmd "python /test/cifar10_estimator/cifar10_main.py 
--data-dir=hdfs://${dfs_name_service}/tmp/cifar-10-data 
--job-dir=hdfs://${dfs_name_service}/tmp/cifar-10-jobdir --num-gpus=0" \
- --num_workers 4 \
- --worker_launch_cmd "python /test/cifar10_estimator/cifar10_main.py 
--data-dir=hdfs://${dfs_name_service}/tmp/cifar-10-data 
--job-dir=hdfs://${dfs_name_service}/tmp/cifar-10-jobdir --train-steps=500 
--eval-batch-size=16 --train-batch-size=16 --sync --num-gpus=0"
-```
-
-
-## 问题
-
-### 问题一: 操作系统重启导致 nodemanager 启动失败
-
-```
-2018-09-20 18:54:39,785 ERROR 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Failed to 
bootstrap configured resource subsystems!
-org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException:
 Unexpected: Cannot create yarn cgroup Subsystem:cpu Mount points:/proc/mounts 
User:yarn Path:/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn
-  at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.initializePreMountedCGroupController(CGroupsHandlerImpl.java:425)
-  at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.initializeCGroupController(CGroupsHandlerImpl.java:377)
-  at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsCpuResourceHandlerImpl.bootstrap(CGroupsCpuResourceHandlerImpl.java:98)
-  at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsCpuResourceHandlerImpl.bootstrap(CGroupsCpuResourceHandlerImpl.java:87)
-  at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerChain.bootstrap(ResourceHandlerChain.java:58)
-  at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:320)
-  at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:389)
-  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
-  at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:929)
-  at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:997)
-2018-09-20 18:54:39,789 INFO org.apache.hadoop.service.AbstractService: 
Service NodeManager failed in state INITED
-```
-
-解决方法：使用 `root` 账号给 `yarn` 用户修改 `/sys/fs/cgroup/cpu,cpuacct` 的权限
-
-```
-chown :yarn -R /sys/fs/cgroup/cpu,cpuacct
-chmod g+rwx -R /sys/fs/cgroup/cpu,cpuacct
-```
-
-在支持 gpu 时，还需 cgroup devices 路径权限
-
-```
-chown :yarn -R /sys/fs/cgroup/devices
-chmod g+rwx -R /sys/fs/cgroup/devices
-```
-
-
-### 问题二：container-executor 权限问题
-
-```
-2018-09-21 09:36:26,102 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
 IOException executing command:
-java.io.IOException: Cannot run program 
"/etc/yarn/sbin/Linux-amd64-64/container-executor": error=13, Permission denied
-        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
-        at org.apache.hadoop.util.Shell.runCommand(Shell.java:938)
-        at org.apache.hadoop.util.Shell.run(Shell.java:901)
-        at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213)
-```
-
-`/etc/yarn/sbin/Linux-amd64-64/container-executor` 该文件的权限应为6050
-
-### 问题三：查看系统服务启动日志
-
-```
-journalctl -u docker
-```
-
-### 问题四：Docker 无法删除容器的问题 `device or resource busy`
-
-```bash
-$ docker rm 0bfafa146431
-Error response from daemon: Unable to remove filesystem for 
0bfafa146431771f6024dcb9775ef47f170edb2f1852f71916ba44209ca6120a: remove 
/app/docker/containers/0bfafa146431771f6024dcb9775ef47f170edb2f152f71916ba44209ca6120a/shm:
 device or resource busy
-```
-
-编写 `find-busy-mnt.sh` 脚本，检查 `device or resource busy` 状态的容器挂载文件
-
-```bash
-#!/bin/bash
-
-# A simple script to get information about mount points and pids and their
-# mount namespaces.
-
-if [ $# -ne 1 ];then
-echo "Usage: $0 <devicemapper-device-id>"
-exit 1
-fi
-
-ID=$1
-
-MOUNTS=`find /proc/*/mounts | xargs grep $ID 2>/dev/null`
-
-[ -z "$MOUNTS" ] &&  echo "No pids found" && exit 0
-
-printf "PID\tNAME\t\tMNTNS\n"
-echo "$MOUNTS" | while read LINE; do
-PID=`echo $LINE | cut -d ":" -f1 | cut -d "/" -f3`
-# Ignore self and thread-self
-if [ "$PID" == "self" ] || [ "$PID" == "thread-self" ]; then
-  continue
-fi
-NAME=`ps -q $PID -o comm=`
-MNTNS=`readlink /proc/$PID/ns/mnt`
-printf "%s\t%s\t\t%s\n" "$PID" "$NAME" "$MNTNS"
-done
-```
-
-查找占用目录的进程
-
-```bash
-$ chmod +x find-busy-mnt.sh
-./find-busy-mnt.sh 
0bfafa146431771f6024dcb9775ef47f170edb2f152f71916ba44209ca6120a
-# PID   NAME            MNTNS
-# 5007  ntpd            mnt:[4026533598]
-$ kill -9 5007
-```
-
-
-### 问题五：命令 sudo nvidia-docker run 报错
-
-```
-docker: Error response from daemon: create nvidia_driver_361.42: 
VolumeDriver.Create: internal error, check logs for details.
-See 'docker run --help'.
-```
-
-解决方法：
-
-```
-#查看nvidia-docker状态，是不是启动有问题，可以使用
-$ systemctl status nvidia-docker
-$ journalctl -n -u nvidia-docker
-#重启下nvidia-docker
-systemctl stop nvidia-docker
-systemctl start nvidia-docker
-```
-
-### 问题六：YARN 启动容器失败
-
-如果你创建的容器数（PS+Work>GPU显卡总数），可能会出现容器创建失败，那是因为在一台服务器上同时创建了超过本机显卡总数的容器。
diff --git a/dev-support/submarine-installer/README-CN.md 
b/dev-support/submarine-installer/README-CN.md
index a44f3ee..6527ae8 100644
--- a/dev-support/submarine-installer/README-CN.md
+++ b/dev-support/submarine-installer/README-CN.md
@@ -2,13 +2,13 @@
 
 ## 项目介绍
 
-介绍 **submarine-installer** 项目之前，首先要说明一下 **Hadoop {Submarine}**  这个项目，**Hadoop 
{Submarine}**  是 hadoop 3.2 版本中最新发布的机器学习框架子项目，他让 hadoop 支持 
`Tensorflow`、`MXNet`、`Caffe`、`Spark` 
等多种深度学习框架，提供了机器学习算法开发、分布式模型训练、模型管理和模型发布等全功能的系统框架，结合 hadoop 
与身俱来的数据存储和数据处理能力，让数据科学家们能够更好的挖掘和发挥出数据的价值。
+介绍 **submarine-installer** 项目之前，首先要说明一下 **Submarine**  这个项目，**Submarine** 是 
apache 顶级的机器学习平台项目。他将致力于支持 `Tensorflow`、`MXNet`、`Caffe`、`Spark` 
等多种深度学习框架，提供了机器学习算法开发、分布式模型训练、模型管理和模型发布等全功能的系统框架，结合 hadoop 
与身俱来的数据存储和数据处理能力，让数据科学家们能够更好的挖掘和发挥出数据的价值。
 
-hadoop 在 2.9 版本中就已经让 YARN 支持了 Docker 容器的资源调度模式，**Hadoop {Submarine}** 在此基础之上通过 
YARN 把分布式深度学习框架以 Docker 容器的方式进行调度和运行起来。
+hadoop 在 2.9 版本中就已经让 YARN 支持了 Docker 容器的资源调度模式，**Submarine** 在此基础之上通过 YARN 
把分布式深度学习框架以 Docker 容器的方式进行调度和运行起来。
 
 由于分布式深度学习框架需要运行在多个 Docker 
的容器之中，并且需要能够让运行在容器之中的各个服务相互协调，完成分布式机器学习的模型训练和模型发布等服务，这其中就会牵涉到 `DNS`、`Docker` 、 
`GPU`、`Network`、`显卡`、`操作系统内核` 修改等多个系统工程问题，正确的部署好 **Hadoop {Submarine}**  
的运行环境是一件很困难和耗时的事情。
 
-为了降低 hadoop 2.9 以上版本的 docker 等组件的部署难度，所以我们专门开发了这个用来部署 `Hadoop {Submarine} ` 
运行时环境的 `submarine-installer` 
项目，提供一键安装脚本，也可以分步执行安装、卸载、启动和停止各个组件，同时讲解每一步主要参数配置和注意事项。我们同时还向 hadoop 社区提交了部署 
`Hadoop {Submarine} ` 运行时环境的 [中文手册](InstallationGuideChineseVersion.md) 和 
[英文手册](InstallationGuide.md) ，帮助用户更容易的部署，发现问题也可以及时解决。
+为了降低 hadoop 2.9 以上版本的 docker 等组件的部署难度，所以我们专门开发了这个用来部署 `Submarine` 运行时环境的 
`submarine-installer` 
项目，提供一键安装脚本，也可以分步执行安装、卸载、启动和停止各个组件，同时讲解每一步主要参数配置和注意事项。我们同时提供了 
[中文手册](../../docs/helper/InstallationGuideChineseVersion.md) 和 
[英文手册](../../docs/helper/InstallationGuide.md) ，帮助用户更容易的部署，发现问题也可以及时解决。
 
 ## 先决条件
 
@@ -28,7 +28,7 @@ hadoop 在 2.9 版本中就已经让 YARN 支持了 Docker 容器的资源调度
 
   机器学习是一个计算密度型系统，对数据传输性能要求非常高，所以我们使用了网络效率损耗最小的 ETCD 网络组件，它可以通过 BGP 路由方式支持 
overlay 网络，同时在跨机房部署时支持隧道模式。
 
-  你需要选择至少三台以上的服务器作为 ETCD 的运行服务器，这样可以让 `Hadoop {Submarine} ` 有较好的容错性和稳定性。
+  你需要选择至少三台以上的服务器作为 ETCD 的运行服务器，这样可以让 `Submarine` 有较好的容错性和稳定性。
 
   在 **ETCD_HOSTS** 配置项中输入作为 ETCD 服务器的IP数组，参数配置一般是这样：
 
diff --git a/dev-support/submarine-installer/README.md 
b/dev-support/submarine-installer/README.md
index 1990ad4..0305331 100644
--- a/dev-support/submarine-installer/README.md
+++ b/dev-support/submarine-installer/README.md
@@ -3,13 +3,13 @@
 
 ## Introduction
 
-Hadoop {Submarine} is the latest machine learning framework subproject in the 
Hadoop 3.2 release. It allows Hadoop to support `Tensorflow`, `MXNet`,` Caffe`, 
`Spark`, etc. A variety of deep learning frameworks provide a full-featured 
system framework for machine learning algorithm development, distributed model 
training, model management, and model publishing, combined with hadoop's 
intrinsic data storage and data processing capabilities to enable data 
scientists to Good mining and the v [...]
+Submarine is the latest machine learning framework. It aims to support 
`Tensorflow`, `MXNet`,` Caffe`, `Spark`, etc. A variety of deep learning 
frameworks provide a full-featured system framework for machine learning 
algorithm development, distributed model training, model management, and model 
publishing, combined with hadoop's intrinsic data storage and data processing 
capabilities to enable data scientists to Good mining and the value of the data.
 
-Hadoop has enabled YARN to support Docker container since 2.x. **Hadoop 
{Submarine}** then uses YARN to schedule and run the distributed deep learning 
framework in the form of a Docker container.
+Hadoop has enabled YARN to support Docker container since 2.x. **Submarine** 
then uses YARN to schedule and run the distributed deep learning framework in 
the form of a Docker container.
 
-Since the distributed deep learning framework needs to run in multiple Docker 
containers and needs to be able to coordinate the various services running in 
the container, complete the services of model training and model publishing for 
distributed machine learning. Involving multiple system engineering problems 
such as `DNS`, `Docker`, `GPU`, `Network`, `graphics card`, `operating system 
kernel` modification, etc. It is very difficult and time-consuming to properly 
deploy the **Hadoop {S [...]
+Since the distributed deep learning framework needs to run in multiple Docker 
containers and needs to be able to coordinate the various services running in 
the container, complete the services of model training and model publishing for 
distributed machine learning. Involving multiple system engineering problems 
such as `DNS`, `Docker`, `GPU`, `Network`, `graphics card`, `operating system 
kernel` modification, etc. It is very difficult and time-consuming to properly 
deploy the **Submarine [...]
 
-In order to reduce the difficulty of deploying components, we have developed 
this **submarine-installer** project to deploy the **Hadoop {Submarine}** 
runtime environment, providing a one-click installation script or step-by-step 
installation. Unload, start, and stop individual components, and explain the 
main parameter configuration and considerations for each step. We also 
submitted a [Chinese manual](InstallationGuideChineseVersion.md) and an 
[English manual](InstallationGuide.md) for [...]
+In order to reduce the difficulty of deploying components, we have developed 
this **submarine-installer** project to deploy the **Submarine** runtime 
environment, providing a one-click installation script or step-by-step 
installation. Unload, start, and stop individual components, and explain the 
main parameter configuration and considerations for each step. We also provides 
a [Chinese manual](../../docs/helper/InstallationGuideChineseVersion.md) and an 
[English manual](../../docs/helper [...]
 
 This installer is just created for your convenience and for test purpose only. 
You can choose to install required libraries by yourself, please don't run this 
script in your production envionrment before fully validate it in a sandbox 
environment.
 
@@ -33,7 +33,7 @@ Before deploying with submarine-installer, you can refer to 
the existing configu
 
   Please note that you can choose to use different Docker networks. ETCD is 
not the only network solution supported by Submarine.
 
-  You need to select at least three servers as the running server for ETCD, 
which will make **Hadoop {Submarine}** better fault tolerant and stable.
+  You need to select at least three servers as the running server for ETCD, 
which will make **Submarine** better fault tolerant and stable.
 
   Enter the IP array as the ETCD server in the ETCD_HOSTS configuration item. 
The parameter configuration is generally like this:
 
diff --git a/docs/README.md b/docs/README.md
index 0331d49..fecafcc 100644
--- a/docs/README.md
+++ b/docs/README.md
@@ -36,9 +36,9 @@ Click below contents if you want to understand more.
 
 Here're some examples about Submarine usage.
 
-[Running Distributed CIFAR 10 Tensorflow 
Job](./helper/RunningDistributedCifar10TFJobs.md)
+[Running Distributed CIFAR 10 Tensorflow 
Job_With_Yarn_Service_Runtime](helper/RunningDistributedCifar10TFJobsWithYarnService.md)
 
-[Running Standalone CIFAR 10 PyTorch 
Job](./helper/RunningSingleNodeCifar10PTJobs.md)
+[Running Standalone CIFAR 10 PyTorch 
Job_With_Yarn_Service_Runtime](helper/RunningSingleNodeCifar10PTJobsWithYarnService.md)
 
 [Running Distributed thchs30 Kaldi 
Job](./ecosystem/kaldi/RunningDistributedThchs30KaldiJobs.md)
 
diff --git a/docs/helper/InstallationGuide.md b/docs/helper/InstallationGuide.md
index 3fc537d..b2a4edb 100644
--- a/docs/helper/InstallationGuide.md
+++ b/docs/helper/InstallationGuide.md
@@ -1,18 +1,3 @@
-<!--
-   Licensed to the Apache Software Foundation (ASF) under one or more
-   contributor license agreements.  See the NOTICE file distributed with
-   this work for additional information regarding copyright ownership.
-   The ASF licenses this file to You under the Apache License, Version 2.0
-   (the "License"); you may not use this file except in compliance with
-   the License.  You may obtain a copy of the License at
-   http://www.apache.org/licenses/LICENSE-2.0
-   Unless required by applicable law or agreed to in writing, software
-   distributed under the License is distributed on an "AS IS" BASIS,
-   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-   See the License for the specific language governing permissions and
-   limitations under the License.
--->
-
 # Submarine Installation Guide
 
 ## Prerequisites
@@ -25,8 +10,8 @@ The operating system and kernel versions we have tested are 
as shown in the foll
 
 | Enviroment | Verion |
 | ------ | ------ |
-| Operating System | centos-release-7-5.1804.el7.centos.x86_64 |
-| Kernal | 3.10.0-862.el7.x86_64 |
+| Operating System | centos-release-7-3.1611.el7.centos.x86_64 |
+| Kernal | 3.10.0-514.el7.x86_64 |
 
 ### User & Group
 
@@ -63,8 +48,8 @@ yum install gcc make g++
 # Approach 1：
 yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r)
 # Approach 2：
-wget 
http://vault.centos.org/7.3.1611/os/x86_64/Packages/kernel-headers-3.10.0-862.el7.x86_64.rpm
-rpm -ivh kernel-headers-3.10.0-862.el7.x86_64.rpm
+wget 
http://vault.centos.org/7.3.1611/os/x86_64/Packages/kernel-headers-3.10.0-514.el7.x86_64.rpm
+rpm -ivh kernel-headers-3.10.0-514.el7.x86_64.rpm
 ```
 
 ### GPU Servers (Only for Nvidia GPU equipped nodes)
@@ -166,43 +151,26 @@ 
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
 
 ### Docker Installation
 
-The following steps show how to install docker 18.06.1.ce. You can choose 
other approaches to install Docker.
+We recommend to use Docker version >= 1.12.5, following steps are just for 
your reference. You can always to choose other approaches to install Docker.
 
 ```
-# Remove old version docker
-sudo yum remove docker \
-                docker-client \
-                docker-client-latest \
-                docker-common \
-                docker-latest \
-                docker-latest-logrotate \
-                docker-logrotate \
-                docker-engine
-
-# Docker version
-export DOCKER_VERSION="18.06.1.ce"
-# Setup the repository
-sudo yum install -y yum-utils \
-  device-mapper-persistent-data \
-  lvm2
-sudo yum-config-manager \
-    --add-repo \
-    https://download.docker.com/linux/centos/docker-ce.repo
-
-# Check docker version
-yum list docker-ce --showduplicates | sort -r
+yum -y update
+yum -y install yum-utils
+yum-config-manager --add-repo https://yum.dockerproject.org/repo/main/centos/7
+yum -y update
 
-# Install docker with specified DOCKER_VERSION
-sudo yum install -y docker-ce-${DOCKER_VERSION} 
docker-ce-cli-${DOCKER_VERSION} containerd.io
+# Show available packages
+yum search --showduplicates docker-engine
 
-# Start docker
+# Install docker 1.12.5
+yum -y --nogpgcheck install docker-engine-1.12.5*
 systemctl start docker
 
 chown hadoop:netease /var/run/docker.sock
 chown hadoop:netease /usr/bin/docker
 ```
 
-Reference：https://docs.docker.com/install/linux/docker-ce/centos/
+Reference：https://docs.docker.com/cs-engine/1.12/
 
 ### Docker Configuration
 
@@ -226,40 +194,46 @@ sudo systemctl restart docker
 
 
 
-### Check docker version
+### Docker EE version
 
 ```bash
 $ docker version
 
 Client:
- Version:      18.06.1-ce
- API version:  1.38
- Go version:   go1.10.3
- Git commit:   e68fc7a
- Built:        Tue Aug 21 17:23:03 2018
+ Version:      1.12.5
+ API version:  1.24
+ Go version:   go1.6.4
+ Git commit:   7392c3b
+ Built:        Fri Dec 16 02:23:59 2016
  OS/Arch:      linux/amd64
- Experimental: false
 
 Server:
- Version:      18.06.1-ce
- API version:  1.38 (minimum version 1.12)
- Go version:   go1.10.3
- Git commit:   e68fc7a
- Built:        Tue Aug 21 17:23:03 2018
+ Version:      1.12.5
+ API version:  1.24
+ Go version:   go1.6.4
+ Git commit:   7392c3b
+ Built:        Fri Dec 16 02:23:59 2016
  OS/Arch:      linux/amd64
- Experimental: false
 ```
 
 ### Nvidia-docker Installation (Only for Nvidia GPU equipped nodes)
 
-Submarine has already supported nvidia-docker V2
+Submarine depends on nvidia-docker 1.0 version
 
 ```
-# Add the package repositories
-distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
-curl -s -L 
https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.repo
 | \
-  sudo tee /etc/yum.repos.d/nvidia-container-runtime.repo
-sudo yum install -y nvidia-docker2-2.0.3-1.docker18.06.1.ce
+wget -P /tmp 
https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker-1.0.1-1.x86_64.rpm
+sudo rpm -i /tmp/nvidia-docker*.rpm
+# Start nvidia-docker
+sudo systemctl start nvidia-docker
+
+# Check nvidia-docker status：
+systemctl status nvidia-docker
+
+# Check nvidia-docker log：
+journalctl -u nvidia-docker
+
+# Test nvidia-docker-plugin
+curl http://localhost:3476/v1.0/docker/cli
 ```
 
 According to `nvidia-driver` version, add folders under the path of  
`/var/lib/nvidia-docker/volumes/nvidia_driver/`
@@ -276,7 +250,7 @@ cp /usr/lib64/libcuda* 
/var/lib/nvidia-docker/volumes/nvidia_driver/390.87/lib64
 cp /usr/lib64/libnvidia* 
/var/lib/nvidia-docker/volumes/nvidia_driver/390.87/lib64
 
 # Test with nvidia-smi
-nvidia-docker run --rm nvidia/cuda:10.0-devel nvidia-smi
+nvidia-docker run --rm nvidia/cuda:9.0-devel nvidia-smi
 ```
 
 Test docker, nvidia-docker, nvidia-driver installation
@@ -295,17 +269,16 @@ import tensorflow as tf
 tf.test.is_gpu_available()
 ```
 
-The way to uninstall nvidia-docker V2
-```
-sudo yum remove -y nvidia-docker2-2.0.3-1.docker18.06.1.ce
-```
+[The way to uninstall nvidia-docker 
1.0](https://github.com/nvidia/nvidia-docker/wiki/Installation-(version-2.0))
 
 Reference:
-https://github.com/NVIDIA/nvidia-docker
+https://github.com/NVIDIA/nvidia-docker/tree/1.0
+
 
 ### Tensorflow Image
 
-There is no need to install CUDNN and CUDA on the servers, because CUDNN and 
CUDA can be added in the docker images. We can get basic docker images by 
referring to [Write Dockerfile](WriteDockerfileTF.md).
+How to build a Tensorflow Image, please refer to 
[WriteDockerfileTF.md](WriteDockerfileTF.md)
+
 
 ### Test tensorflow in a docker container
 
@@ -335,6 +308,124 @@ If there are some errors, we could check the following 
configuration.
    ls -l /usr/local/nvidia/lib64 | grep libcuda.so
    ```
 
+
+## Hadoop Installation
+
+### Get Hadoop Release
+You can either get Hadoop release binary or compile from source code. Please 
follow the guides from [Hadoop Homepage](https://hadoop.apache.org/).
+For hadoop cluster setup, please refer to [Hadoop Cluster 
Setup](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html)
+
+
+### Start yarn services
+
+```
+YARN_LOGFILE=resourcemanager.log ./sbin/yarn-daemon.sh start resourcemanager
+YARN_LOGFILE=nodemanager.log ./sbin/yarn-daemon.sh start nodemanager
+YARN_LOGFILE=timeline.log ./sbin/yarn-daemon.sh start timelineserver
+YARN_LOGFILE=mr-historyserver.log ./sbin/mr-jobhistory-daemon.sh start 
historyserver
+```
+
+### Test with a MR wordcount job
+
+```
+./bin/hadoop jar 
/home/hadoop/hadoop-current/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0-SNAPSHOT.jar
 wordcount /tmp/wordcount.txt /tmp/wordcount-output4
+```
+
+## Yarn Configurations for GPU (Only if Nvidia GPU is used)
+
+### GPU configurations for both resourcemanager and nodemanager
+
+Add the yarn resource configuration file, named resource-types.xml
+
+   ```
+   <configuration>
+     <property>
+       <name>yarn.resource-types</name>
+       <value>yarn.io/gpu</value>
+     </property>
+   </configuration>
+   ```
+
+#### GPU configurations for resourcemanager
+
+The scheduler used by resourcemanager must be  capacity scheduler, and 
yarn.scheduler.capacity.resource-calculator in  capacity-scheduler.xml should 
be DominantResourceCalculator
+
+   ```
+   <configuration>
+     <property>
+       <name>yarn.scheduler.capacity.resource-calculator</name>
+       
<value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
+     </property>
+   </configuration>
+   ```
+
+#### GPU configurations for nodemanager
+
+Add configurations in yarn-site.xml
+
+   ```
+   <configuration>
+     <property>
+       <name>yarn.nodemanager.resource-plugins</name>
+       <value>yarn.io/gpu</value>
+     </property>
+   </configuration>
+   ```
+
+Add configurations in container-executor.cfg
+
+   ```
+   [docker]
+   ...
+   # Add configurations in `[docker]` part：
+   # /usr/bin/nvidia-docker is the path of nvidia-docker command
+   # nvidia_driver_375.26 means that nvidia driver version is <version>. 
nvidia-smi command can be used to check the version
+   docker.allowed.volume-drivers=/usr/bin/nvidia-docker
+   
docker.allowed.devices=/dev/nvidiactl,/dev/nvidia-uvm,/dev/nvidia-uvm-tools,/dev/nvidia1,/dev/nvidia0
+   docker.allowed.ro-mounts=nvidia_driver_<version>
+
+   [gpu]
+   module.enabled=true
+
+   [cgroups]
+   # /sys/fs/cgroup is the cgroup mount destination
+   # /hadoop-yarn is the path yarn creates by default
+   root=/sys/fs/cgroup
+   yarn-hierarchy=/hadoop-yarn
+   ```
+
+## Tensorflow Job with yarn runtime.
+
+### Run a tensorflow job in a zipped python virtual environment
+
+Refer to build_python_virtual_env.sh in the directory of
+${SUBMARINE_REPO_PATH}/dev-support/mini-submarine/submarine/ to build a zipped 
python virtual
+environment. ${SUBMARINE_REPO_PATH} indicates submarine repo location.
+The generated zipped file can be named myvenv.zip.
+
+Copy 
${SUBMARINE_REPO_PATH}/dev-support/mini-submarine/submarine/run_submarine_mnist_tony.sh
+to the server on which you submit jobs. And modify the variables, 
SUBMARINE_VERSION, HADOOP_VERSION, SUBMARINE_PATH,
+HADOOP_CONF_PATH and MNIST_PATH in it, according to your environment. If 
Kerberos
+is enabled, please delete the parameter, --insecure, in the command.
+
+Run a distributed tensorflow job.
+```
+./run_submarine_mnist_tony.sh -d http://yann.lecun.com/exdb/mnist/
+```
+The parameter -d is used to specify the url from which we can get the mnist 
data.
+
+### Run a tensorflow job in a docker container(TODO)
+
+
+## Yarn Service Runtime Requirement (Deprecated)
+
+The function of "yarn native service" is available since hadoop 3.1.0.
+Submarine supports to utilize yarn native service to submit a ML job. However, 
as
+there are several other components required. It is hard to enable
+and maintain the components. So yarn service runtime is deprecated since 
submarine 0.3.0.
+We recommend to use YarnRuntime instead. If you still want to enable it, please
+follow these steps.
+
 ### Etcd Installation
 
 etcd is a distributed reliable key-value store for the most critical data of a 
distributed system, Registration and discovery of services used in containers.
@@ -363,8 +454,6 @@ $ etcdctl member list
 b3d05464c356441a: name=etcdnode1 peerURLs=http://${etcd_host_ip3}:2380 
clientURLs=http://${etcd_host_ip3}:2379 isLeader=true
 ```
 
-
-
 ### Calico Installation
 
 Calico creates and manages a flat three-tier network, and each container is 
assigned a routable ip. We just add the steps here for your convenience.
@@ -406,117 +495,45 @@ docker run --net calico-network --name workload-B -tid 
busybox
 docker exec workload-A ping workload-B
 ```
 
-
-## Hadoop Installation
-
-### Get Hadoop Release
-You can either get Hadoop release binary or compile from source code. Please 
follow the https://hadoop.apache.org/ guides.
-
-
-### Start YARN Service
-
-```
-YARN_LOGFILE=resourcemanager.log ./sbin/yarn-daemon.sh start resourcemanager
-YARN_LOGFILE=nodemanager.log ./sbin/yarn-daemon.sh start nodemanager
-YARN_LOGFILE=timeline.log ./sbin/yarn-daemon.sh start timelineserver
-YARN_LOGFILE=mr-historyserver.log ./sbin/mr-jobhistory-daemon.sh start 
historyserver
-```
-
-### Start YARN Registery DNS Service (only when using YARN native service 
runtime)
+### Enable calico network for docker container
+Set yarn-site.xml to use bridge for docker container
 
 ```
-sudo YARN_LOGFILE=registrydns.log ./yarn-daemon.sh start registrydns
+<property>
+    
<name>yarn.nodemanager.runtime.linux.docker.default-container-network</name>
+    <value>calico-network</value>
+  </property>
+  <property>
+    <name>yarn.nodemanager.runtime.linux.allowed-runtimes</name>
+    <value>default,docker</value>
+  </property>
+  <property>
+    
<name>yarn.nodemanager.runtime.linux.docker.allowed-container-networks</name>
+    <value>host,none,bridge,calico-network</value>
+  </property>
 ```
 
-### Test with a MR wordcount job
+Add calico-network to container-executor.cfg
 
 ```
-./bin/hadoop jar 
/home/hadoop/hadoop-current/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0-SNAPSHOT.jar
 wordcount /tmp/wordcount.txt /tmp/wordcount-output4
+docker.allowed.networks=bridge,host,none,calico-network
 ```
 
-### GPU configurations for both resourcemanager and nodemanager
-
-Add the yarn resource configuration file, named resource-types.xml
-
-   ```
-   <configuration>
-     <property>
-       <name>yarn.resource-types</name>
-       <value>yarn.io/gpu</value>
-     </property>
-   </configuration>
-   ```
-
-### GPU configurations for resourcemanager
-
-The scheduler used by resourcemanager must be  capacity scheduler, and 
yarn.scheduler.capacity.resource-calculator in  capacity-scheduler.xml should 
be DominantResourceCalculator
-
-   ```
-   <configuration>
-     <property>
-       <name>yarn.scheduler.capacity.resource-calculator</name>
-       
<value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
-     </property>
-   </configuration>
-   ```
-
-### GPU configurations for nodemanager
-
-Add configurations in yarn-site.xml
-
-   ```
-   <configuration>
-     <property>
-       <name>yarn.nodemanager.resource-plugins</name>
-       <value>yarn.io/gpu</value>
-     </property>
-     <!--Use nvidia docker v2-->
-     <property>
-        <name>yarn.nodemanager.resource-plugins.gpu.docker-plugin</name>
-        <value>nvidia-docker-v2</value>
-     </property>
-   </configuration>
-   ```
-
-Add configurations in container-executor.cfg
+Then restart all nodemanagers.
 
-   ```
-   [docker]
-   ...
-   # Add configurations in `[docker]` part：
-   # /usr/bin/nvidia-docker is the path of nvidia-docker command
-   # nvidia_driver_375.26 means that nvidia driver version is <version>. 
nvidia-smi command can be used to check the version
-   docker.allowed.volume-drivers=/usr/bin/nvidia-docker
-   
docker.allowed.devices=/dev/nvidiactl,/dev/nvidia-uvm,/dev/nvidia-uvm-tools,/dev/nvidia1,/dev/nvidia0
-   docker.allowed.ro-mounts=nvidia_driver_<version>
-   # Use nvidia docker v2
-   docker.allowed.runtimes=nvidia
+### Start yarn registery dns service
 
-   [gpu]
-   module.enabled=true
+Yarn registry nds server exposes existing service-discovery information via DNS
+and enables docker containers to IP mappings. By using it, the containers of a 
+ML job knows how to communicate with each other.
 
-   [cgroups]
-   # /sys/fs/cgroup is the cgroup mount destination
-   # /hadoop-yarn is the path yarn creates by default
-   root=/sys/fs/cgroup
-   yarn-hierarchy=/hadoop-yarn
-   ```
+Please specify a server to start yarn registery dns service. For details please
+refer to [Registry DNS 
Server](http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn-service/RegistryDNS.html)
 
-### Run a distributed tensorflow gpu job
-
-```bash
- ... job run \
- --env DOCKER_JAVA_HOME=/opt/java \
- --env DOCKER_HADOOP_HDFS_HOME=/hadoop-current --name distributed-tf-gpu \
- --env YARN_CONTAINER_RUNTIME_DOCKER_CONTAINER_NETWORK=calico-network \
- --docker_image tf-1.13.1-gpu:0.0.1 \
- --input_path hdfs://default/tmp/cifar-10-data \
- --checkpoint_path hdfs://default/user/hadoop/tf-distributed-checkpoint \
- --num_ps 0 \
- --ps_resources memory=4G,vcores=2,gpu=0 \
- --ps_launch_cmd "python /test/cifar10_estimator/cifar10_main.py 
--data-dir=%input_path% --job-dir=%checkpoint_path% --num-gpus=0" \
- --worker_resources memory=4G,vcores=2,gpu=1 --verbose \
- --num_workers 1 \
- --worker_launch_cmd "python /test/cifar10_estimator/cifar10_main.py 
--data-dir=%input_path% --job-dir=%checkpoint_path% --train-steps=500 
--eval-batch-size=16 --train-batch-size=16 --sync --num-gpus=1"
 ```
+sudo YARN_LOGFILE=registrydns.log ./yarn-daemon.sh start registrydns
+```
+
+### Run a submarine job with yarn service runtime
 
+Please refer to [Running Distributed CIFAR 10 Tensorflow 
Job_With_Yarn_Service_Runtime](RunningDistributedCifar10TFJobsWithYarnService.md)
\ No newline at end of file
diff --git a/docs/helper/InstallationGuideChineseVersion.md 
b/docs/helper/InstallationGuideChineseVersion.md
index 490bc9f..1542c55 100644
--- a/docs/helper/InstallationGuideChineseVersion.md
+++ b/docs/helper/InstallationGuideChineseVersion.md
@@ -1,30 +1,15 @@
-<!--
-   Licensed to the Apache Software Foundation (ASF) under one or more
-   contributor license agreements.  See the NOTICE file distributed with
-   this work for additional information regarding copyright ownership.
-   The ASF licenses this file to You under the Apache License, Version 2.0
-   (the "License"); you may not use this file except in compliance with
-   the License.  You may obtain a copy of the License at
-   http://www.apache.org/licenses/LICENSE-2.0
-   Unless required by applicable law or agreed to in writing, software
-   distributed under the License is distributed on an "AS IS" BASIS,
-   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-   See the License for the specific language governing permissions and
-   limitations under the License.
--->
-
 # Submarine 安装说明
 
 ## Prerequisites
 
 ### 操作系统
 
-我们使用的操作系统版本是 centos-release-7-5.1804.el7.centos.x86_64, 内核版本是 
3.10.0-862.el7.x86_64。
+我们使用的操作系统版本是 centos-release-7-3.1611.el7.centos.x86_64, 内核版本是 
3.10.0-514.el7.x86_64 ，应该是最低版本了。
 
 | Enviroment | Verion |
 | ------ | ------ |
-| Operating System | centos-release-7-5.1804.el7.centos.x86_64 |
-| Kernal | 3.10.0-862.el7.x86_64 |
+| Operating System | centos-release-7-3.1611.el7.centos.x86_64 |
+| Kernal | 3.10.0-514.el7.x86_64 |
 
 ### User & Group
 
@@ -59,8 +44,8 @@ yum install gcc make g++
 # 方法一：
 yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r)
 # 方法二：
-wget 
http://vault.centos.org/7.3.1611/os/x86_64/Packages/kernel-headers-3.10.0-862.el7.x86_64.rpm
-rpm -ivh kernel-headers-3.10.0-862.el7.x86_64.rpm
+wget 
http://vault.centos.org/7.3.1611/os/x86_64/Packages/kernel-headers-3.10.0-514.el7.x86_64.rpm
+rpm -ivh kernel-headers-3.10.0-514.el7.x86_64.rpm
 ```
 
 ### 检查 GPU 版本
@@ -87,7 +72,7 @@ sudo /usr/local/cuda-10.0/bin/uninstall_cuda_10.0.pl
 sudo /usr/bin/nvidia-uninstall
 ```
 
-安装nvidia-detect，用于检查显卡版本
+安装 nvidia-detect，用于检查显卡版本
 
 ```
 yum install nvidia-detect
@@ -100,7 +85,7 @@ This device requires the current 390.87 NVIDIA driver 
kmod-nvidia
 An Intel display controller was also detected
 ```
 
-注意这里的信息 [Quadro K620] 和390.87。
+注意这里的信息 [Quadro K620] 和 390.87。
 下载 
[NVIDIA-Linux-x86_64-390.87.run](https://www.nvidia.com/object/linux-amd64-display-archive.html)
 
 
@@ -156,40 +141,23 @@ 
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
 ### 安装 Docker
 
 ```
-# Remove old version docker
-sudo yum remove docker \
-                docker-client \
-                docker-client-latest \
-                docker-common \
-                docker-latest \
-                docker-latest-logrotate \
-                docker-logrotate \
-                docker-engine
-
-# Docker version
-export DOCKER_VERSION="18.06.1.ce"
-# Setup the repository
-sudo yum install -y yum-utils \
-  device-mapper-persistent-data \
-  lvm2
-sudo yum-config-manager \
-    --add-repo \
-    https://download.docker.com/linux/centos/docker-ce.repo
-
-# Check docker version
-yum list docker-ce --showduplicates | sort -r
-
-# Install docker with specified DOCKER_VERSION
-sudo yum install -y docker-ce-${DOCKER_VERSION} 
docker-ce-cli-${DOCKER_VERSION} containerd.io
-
-# Start docker
+yum -y update
+yum -y install yum-utils
+yum-config-manager --add-repo https://yum.dockerproject.org/repo/main/centos/7
+yum -y update
+
+# 显示 available 的安装包
+yum search --showduplicates docker-engine
+
+# 安装 1.12.5 版本 docker
+yum -y --nogpgcheck install docker-engine-1.12.5*
 systemctl start docker
 
 chown hadoop:netease /var/run/docker.sock
 chown hadoop:netease /usr/bin/docker
 ```
 
-Reference：https://docs.docker.com/install/linux/docker-ce/centos/
+Reference：https://docs.docker.com/cs-engine/1.12/
 
 ### 配置 Docker
 
@@ -213,40 +181,46 @@ sudo systemctl restart docker
 
 
 
-### 检查 Docker version
+### Docker EE version
 
 ```bash
 $ docker version
 
 Client:
- Version:      18.06.1-ce
- API version:  1.38
- Go version:   go1.10.3
- Git commit:   e68fc7a
- Built:        Tue Aug 21 17:23:03 2018
+ Version:      1.12.5
+ API version:  1.24
+ Go version:   go1.6.4
+ Git commit:   7392c3b
+ Built:        Fri Dec 16 02:23:59 2016
  OS/Arch:      linux/amd64
- Experimental: false
 
 Server:
- Version:      18.06.1-ce
- API version:  1.38 (minimum version 1.12)
- Go version:   go1.10.3
- Git commit:   e68fc7a
- Built:        Tue Aug 21 17:23:03 2018
+ Version:      1.12.5
+ API version:  1.24
+ Go version:   go1.6.4
+ Git commit:   7392c3b
+ Built:        Fri Dec 16 02:23:59 2016
  OS/Arch:      linux/amd64
- Experimental: false
 ```
 
 ### 安装 nvidia-docker
 
-Hadoop-3.2 的 submarine 已支持 V2 版本的 nvidia-docker
+Hadoop-3.2 的 submarine 使用的是 1.0 版本的 nvidia-docker
 
 ```
-# Add the package repositories
-distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
-curl -s -L 
https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.repo
 | \
-  sudo tee /etc/yum.repos.d/nvidia-container-runtime.repo
-sudo yum install -y nvidia-docker2-2.0.3-1.docker18.06.1.ce
+wget -P /tmp 
https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker-1.0.1-1.x86_64.rpm
+sudo rpm -i /tmp/nvidia-docker*.rpm
+# 启动 nvidia-docker
+sudo systemctl start nvidia-docker
+
+# 查看 nvidia-docker 状态：
+systemctl status nvidia-docker
+
+# 查看 nvidia-docker 日志：
+journalctl -u nvidia-docker
+
+# 查看 nvidia-docker-plugin 是否正常
+curl http://localhost:3476/v1.0/docker/cli
 ```
 
 在 `/var/lib/nvidia-docker/volumes/nvidia_driver/` 路径下，根据 `nvidia-driver` 
的版本创建文件夹：
@@ -263,7 +237,7 @@ cp /usr/lib64/libcuda* 
/var/lib/nvidia-docker/volumes/nvidia_driver/390.87/lib64
 cp /usr/lib64/libnvidia* 
/var/lib/nvidia-docker/volumes/nvidia_driver/390.87/lib64
 
 # Test nvidia-smi
-nvidia-docker run --rm nvidia/cuda:10.0-devel nvidia-smi
+nvidia-docker run --rm nvidia/cuda:9.0-devel nvidia-smi
 ```
 
 测试 docker, nvidia-docker, nvidia-driver 安装
@@ -282,19 +256,17 @@ import tensorflow as tf
 tf.test.is_gpu_available()
 ```
 
-卸载 nvidia-docker V2 的方法：
-```
-sudo yum remove -y nvidia-docker2-2.0.3-1.docker18.06.1.ce
-```
+卸载 nvidia-docker 1.0 的方法：
+https://github.com/nvidia/nvidia-docker/wiki/Installation-(version-2.0)
 
 reference:
-https://github.com/NVIDIA/nvidia-docker
+https://github.com/NVIDIA/nvidia-docker/tree/1.0
 
 
 
 ### Tensorflow Image
 
-CUDNN 和 CUDA 其实不需要在物理机上安装，因为 Submarine 中提供了已经包含了CUDNN 和 CUDA 
的镜像文件，基础的Dockfile可参见[WriteDockerfile](docs/0.2.0/WriteDockerfileTF)
+创建 tensorflow docker image 
的方法，可以参考文档[WriteDockerfileTF.md](WriteDockerfileTF.md)
 
 ### 测试 TF 环境
 
@@ -323,83 +295,16 @@ $ python >> tf.__version__
    ls -l /usr/local/nvidia/lib64 | grep libcuda.so
    ```
 
-### 安装 Etcd
-
-运行 Submarine/install.sh 脚本，就可以在指定服务器中安装 Etcd 组件和服务自启动脚本。
-
-```shell
-$ ./Submarine/install.sh
-# 通过如下命令查看 Etcd 服务状态
-systemctl status Etcd.service
-```
-
-检查 Etcd 服务状态
-
-```shell
-$ etcdctl cluster-health
-member 3adf2673436aa824 is healthy: got healthy result from 
http://${etcd_host_ip1}:2379
-member 85ffe9aafb7745cc is healthy: got healthy result from 
http://${etcd_host_ip2}:2379
-member b3d05464c356441a is healthy: got healthy result from 
http://${etcd_host_ip3}:2379
-cluster is healthy
-
-$ etcdctl member list
-3adf2673436aa824: name=etcdnode3 peerURLs=http://${etcd_host_ip1}:2380 
clientURLs=http://${etcd_host_ip1}:2379 isLeader=false
-85ffe9aafb7745cc: name=etcdnode2 peerURLs=http://${etcd_host_ip2}:2380 
clientURLs=http://${etcd_host_ip2}:2379 isLeader=false
-b3d05464c356441a: name=etcdnode1 peerURLs=http://${etcd_host_ip3}:2380 
clientURLs=http://${etcd_host_ip3}:2379 isLeader=true
-```
-其中，${etcd_host_ip*} 是etcd服务器的ip
-
-
-### 安装 Calico
-
-运行 Submarine/install.sh 脚本，就可以在指定服务器中安装 Calico 组件和服务自启动脚本。
-
-```
-systemctl start calico-node.service
-systemctl status calico-node.service
-```
-
-#### 检查 Calico 网络
-
-```shell
-# 执行如下命令，注意：不会显示本服务器的状态，只显示其他的服务器状态
-$ calicoctl node status
-Calico process is running.
-
-IPv4 BGP status
-+---------------+-------------------+-------+------------+-------------+
-| PEER ADDRESS  |     PEER TYPE     | STATE |   SINCE    |    INFO     |
-+---------------+-------------------+-------+------------+-------------+
-| ${host_ip1} | node-to-node mesh | up    | 2018-09-21 | Established |
-| ${host_ip2} | node-to-node mesh | up    | 2018-09-21 | Established |
-| ${host_ip3} | node-to-node mesh | up    | 2018-09-21 | Established |
-+---------------+-------------------+-------+------------+-------------+
-
-IPv6 BGP status
-No IPv6 peers found.
-```
-
-创建docker container，验证calico网络
-
-```
-docker network create --driver calico --ipam-driver calico-ipam calico-network
-docker run --net calico-network --name workload-A -tid busybox
-docker run --net calico-network --name workload-B -tid busybox
-docker exec workload-A ping workload-B
-```
-
-
 ## 安装 Hadoop
 
-### 编译 Hadoop
-
-```
-mvn package -Pdist -DskipTests -Dtar
-```
+### 安装 Hadoop
+首先，我们通过源码编译或者直接从官网 [Hadoop Homepage](https://hadoop.apache.org/)下载获取 hadoop 包。
+然后，请参考 [Hadoop Cluster 
Setup](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html)
+进行 Hadoop 集群安装。
 
 
 
-### 启动 YARN服务
+### 启动 YARN 服务
 
 ```
 YARN_LOGFILE=resourcemanager.log ./sbin/yarn-daemon.sh start resourcemanager
@@ -408,13 +313,6 @@ YARN_LOGFILE=timeline.log ./sbin/yarn-daemon.sh start 
timelineserver
 YARN_LOGFILE=mr-historyserver.log ./sbin/mr-jobhistory-daemon.sh start 
historyserver
 ```
 
-### 启动 registery dns 服务 (只有YARN native service 需要)
-
-```
-sudo YARN_LOGFILE=registrydns.log ./yarn-daemon.sh start registrydns
-```
-
-
 
 ### 测试 wordcount
 
@@ -424,7 +322,7 @@ sudo YARN_LOGFILE=registrydns.log ./yarn-daemon.sh start 
registrydns
 ./bin/hadoop jar 
/home/hadoop/hadoop-current/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0-SNAPSHOT.jar
 wordcount /tmp/wordcount.txt /tmp/wordcount-output4
 ```
 
-## 在YARN上使用GPU
+## yarn 使用 GPU 的配置
 
 ### Resourcemanager, Nodemanager 中添加GPU支持
 
@@ -462,11 +360,6 @@ resourcemanager 使用的 scheduler 必须是 capacity scheduler，在 
capacity-
        <name>yarn.nodemanager.resource-plugins</name>
        <value>yarn.io/gpu</value>
      </property>
-     <!--Use nvidia docker v2-->
-     <property>
-       <name>yarn.nodemanager.resource-plugins.gpu.docker-plugin</name>
-       <value>nvidia-docker-v2</value>
-     </property>
    </configuration>
    ```
 
@@ -481,8 +374,6 @@ resourcemanager 使用的 scheduler 必须是 capacity scheduler，在 
capacity-
    docker.allowed.volume-drivers=/usr/bin/nvidia-docker
    
docker.allowed.devices=/dev/nvidiactl,/dev/nvidia-uvm,/dev/nvidia-uvm-tools,/dev/nvidia1,/dev/nvidia0
    docker.allowed.ro-mounts=nvidia_driver_375.26
-   # Use nvidia docker v2
-   docker.allowed.runtimes=nvidia
 
    [gpu]
    module.enabled=true
@@ -494,26 +385,140 @@ resourcemanager 使用的 scheduler 必须是 capacity scheduler，在 
capacity-
    yarn-hierarchy=/hadoop-yarn
    ```
 
-### 提交验证
+## 使用 yarn runtime 提交 tensorflow 任务.
 
-Distributed-shell + GPU + cgroup
+### 使用 zipped python virtual environment 测试 tensorflow job 
+
+使用 ${SUBMARINE_REPO_PATH}/dev-support/mini-submarine/submarine/ 目录下的 
build_python_virtual_env.sh 文件，
+创建 zipped python virtual environment。生成的压缩文件可以被命名为 myvenv.zip，其中
+${SUBMARINE_REPO_PATH} 表示 submarine 代码的根路径。
+
+复制文件 
${SUBMARINE_REPO_PATH}/dev-support/mini-submarine/submarine/run_submarine_mnist_tony.sh
+到提交任务的服务器节点上。根据环境修改其中的变量 SUBMARINE_VERSION，HADOOP_VERSION，SUBMARINE_PATH，
+HADOOP_CONF_PATH，MNIST_PATH。如果开启了 Kerberos 安全认证，请删除命令里的参数 --insecure。
+
+执行一个分布式的 tensorflow 任务.
+```
+./run_submarine_mnist_tony.sh -d http://yann.lecun.com/exdb/mnist/
+```
+参数 -d 用来指定下载 mnist 数据的url地址。 
+
+### 使用 docker container 提交 tensorflow 任务 (TODO)
+
+
+## Yarn Service Runtime (不推荐)
+
+hadoop 3.1.0 提供了 yarn native service 功能，Submarine 可以利用 yarn native service 
提交分布式机器学习任务。
+但是，由于使用 yarn native service 会引入一些额外的组件，导致部署和运维服务比较困难，因而在 Submarine 0.3.0之后 
Yarn Server Runtime 不再推荐使用。我们建议直接使用
+YarnRuntime，这样可以在 yarn 2.9 上提交机器学习任务。
+开启 Yarn Service Runtime，可以参照下面的方法
+
+### 安装 Etcd
+
+运行 Submarine/install.sh 脚本，就可以在指定服务器中安装 Etcd 组件和服务自启动脚本。
+
+```shell
+$ ./Submarine/install.sh
+# 通过如下命令查看 Etcd 服务状态
+systemctl status Etcd.service
+```
+
+检查 Etcd 服务状态
+
+```shell
+$ etcdctl cluster-health
+member 3adf2673436aa824 is healthy: got healthy result from 
http://${etcd_host_ip1}:2379
+member 85ffe9aafb7745cc is healthy: got healthy result from 
http://${etcd_host_ip2}:2379
+member b3d05464c356441a is healthy: got healthy result from 
http://${etcd_host_ip3}:2379
+cluster is healthy
+
+$ etcdctl member list
+3adf2673436aa824: name=etcdnode3 peerURLs=http://${etcd_host_ip1}:2380 
clientURLs=http://${etcd_host_ip1}:2379 isLeader=false
+85ffe9aafb7745cc: name=etcdnode2 peerURLs=http://${etcd_host_ip2}:2380 
clientURLs=http://${etcd_host_ip2}:2379 isLeader=false
+b3d05464c356441a: name=etcdnode1 peerURLs=http://${etcd_host_ip3}:2380 
clientURLs=http://${etcd_host_ip3}:2379 isLeader=true
+```
+其中，${etcd_host_ip*} 是 etcd 服务器的 ip
+
+
+### 安装 Calico
+
+运行 Submarine/install.sh 脚本，就可以在指定服务器中安装 Calico 组件和服务自启动脚本。
 
-```bash
- ... job run \
- --env DOCKER_JAVA_HOME=/opt/java \
- --env DOCKER_HADOOP_HDFS_HOME=/hadoop-current --name distributed-tf-gpu \
- --env YARN_CONTAINER_RUNTIME_DOCKER_CONTAINER_NETWORK=calico-network \
- --docker_image tf-1.13.1-gpu:0.0.1 \
- --input_path hdfs://default/tmp/cifar-10-data \
- --checkpoint_path hdfs://default/user/hadoop/tf-distributed-checkpoint \
- --num_ps 0 \
- --ps_resources memory=4G,vcores=2,gpu=0 \
- --ps_launch_cmd "python /test/cifar10_estimator/cifar10_main.py 
--data-dir=%input_path% --job-dir=%checkpoint_path% --num-gpus=0" \
- --worker_resources memory=4G,vcores=2,gpu=1 --verbose \
- --num_workers 1 \
- --worker_launch_cmd "python /test/cifar10_estimator/cifar10_main.py 
--data-dir=%%input_path%% --job-dir=%checkpoint_path% --train-steps=500 
--eval-batch-size=16 --train-batch-size=16 --sync --num-gpus=1"
 ```
+systemctl start calico-node.service
+systemctl status calico-node.service
+```
+
+#### 检查 Calico 网络
+
+```shell
+# 执行如下命令，注意：不会显示本服务器的状态，只显示其他的服务器状态
+$ calicoctl node status
+Calico process is running.
+
+IPv4 BGP status
++---------------+-------------------+-------+------------+-------------+
+| PEER ADDRESS  |     PEER TYPE     | STATE |   SINCE    |    INFO     |
++---------------+-------------------+-------+------------+-------------+
+| ${host_ip1} | node-to-node mesh | up    | 2018-09-21 | Established |
+| ${host_ip2} | node-to-node mesh | up    | 2018-09-21 | Established |
+| ${host_ip3} | node-to-node mesh | up    | 2018-09-21 | Established |
++---------------+-------------------+-------+------------+-------------+
+
+IPv6 BGP status
+No IPv6 peers found.
+```
+
+创建 docker container，验证 calico 网络
+
+```
+docker network create --driver calico --ipam-driver calico-ipam calico-network
+docker run --net calico-network --name workload-A -tid busybox
+docker run --net calico-network --name workload-B -tid busybox
+docker exec workload-A ping workload-B
+```
+
+### Yarn Docker container开启Calico网络
+在配置文件 yarn-site.xml，为 docker container 设置 Calico 网络。
+
+```
+<property>
+    
<name>yarn.nodemanager.runtime.linux.docker.default-container-network</name>
+    <value>calico-network</value>
+  </property>
+  <property>
+    <name>yarn.nodemanager.runtime.linux.allowed-runtimes</name>
+    <value>default,docker</value>
+  </property>
+  <property>
+    
<name>yarn.nodemanager.runtime.linux.docker.allowed-container-networks</name>
+    <value>host,none,bridge,calico-network</value>
+  </property>
+```
+
+在配置文件 container-executor.cfg 中，添加 bridge 网络
+
+```
+docker.allowed.networks=bridge,host,none,calico-network
+```
+
+重启所有的 nodemanager 节点.
+
+
+### 启动 registery dns 服务
+
+Yarn registry nds server 是为服务发现功能而实现的DNS服务。yarn docker container 通过向 registry 
nds server 注册，对外暴露 container 域名与 container IP/port 的映射关系。
+
+Yarn registery dns 的详细配置信息和部署方式，可以参考 [Registry DNS 
Server](http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn-service/RegistryDNS.html)
+
+启动 registry nds 命令
+```
+sudo YARN_LOGFILE=registrydns.log ./yarn-daemon.sh start registrydns
+```
+
+### 运行 submarine 任务
 
+使用 yarn service runtime 提交 submarine 任务的方法，请参考文档 [Running Distributed CIFAR 10 
Tensorflow 
Job_With_Yarn_Service_Runtime](RunningDistributedCifar10TFJobsWithYarnService.md)
 
 
 
 ## 问题
@@ -543,7 +548,7 @@ chown :yarn -R /sys/fs/cgroup/cpu,cpuacct
 chmod g+rwx -R /sys/fs/cgroup/cpu,cpuacct
 ```
 
-在支持gpu时，还需cgroup devices路径权限
+在支持 gpu 时，还需 cgroup devices 路径权限
 
 ```
 chown :yarn -R /sys/fs/cgroup/devices
@@ -620,7 +625,7 @@ $ kill -9 5007
 ```
 
 
-### 问题五：命令sudo nvidia-docker run 报错
+### 问题五：命令 sudo nvidia-docker run 报错
 
 ```
 docker: Error response from daemon: create nvidia_driver_361.42: 
VolumeDriver.Create: internal error, check logs for details.
diff --git a/docs/helper/QuickStart.md b/docs/helper/QuickStart.md
index dd5c6ac..d26eb4e 100644
--- a/docs/helper/QuickStart.md
+++ b/docs/helper/QuickStart.md
@@ -86,7 +86,7 @@ Although the run job command looks simple, different job may 
have very different
 For a quick try on Mnist example with TonY runtime, check [TonY Mnist 
Example](TonYRuntimeGuide.md)
 
 
-For a quick try on Cifar10 example with YARN native service runtime, check 
[YARN Service Cifar10 Example](RunningDistributedCifar10TFJobs.md)
+For a quick try on Cifar10 example with YARN native service runtime, check 
[YARN Service Cifar10 
Example](RunningDistributedCifar10TFJobsWithYarnService.md)
 
 
 <br />
diff --git a/docs/helper/RunningDistributedCifar10TFJobs.md 
b/docs/helper/RunningDistributedCifar10TFJobsWithYarnService.md
similarity index 90%
rename from docs/helper/RunningDistributedCifar10TFJobs.md
rename to docs/helper/RunningDistributedCifar10TFJobsWithYarnService.md
index bd07deb..f4d9a99 100644
--- a/docs/helper/RunningDistributedCifar10TFJobs.md
+++ b/docs/helper/RunningDistributedCifar10TFJobsWithYarnService.md
@@ -13,7 +13,7 @@
    limitations under the License.
 -->
 
-# Cifar10 Tensorflow Estimator Example With YARN Service
+# Cifar10 Tensorflow Estimator Example With YARN Native Runtime
 
 ## Prepare data for training
 
@@ -55,13 +55,22 @@ Refer to [Write Dockerfile](WriteDockerfileTF.md) to build 
a Docker image or use
 
 ## Run Tensorflow jobs
 
+Set submarine.runtime.class to YarnServiceRuntimeFactory in submarine-site.xml.
+```
+<property>
+    <name>submarine.runtime.class</name>
+    
<value>org.apache.submarine.server.submitter.yarnservice.YarnServiceRuntimeFactory</value>
+    <description>RuntimeFactory for Submarine jobs</description>
+  </property>
+```
+The file, named submarine-site.xml, is in the path of ${SUBMARINE_HOME}/conf.
+
 ### Run standalone training
 
 ```
-SUBMARINE_VERSION=0.2.0
-CLASSPATH=`path-to/hadoop classpath 
--glob`:path-to/hadoop-submarine-core-${SUBMARINE_VERSION}.jar:
-path-to/hadoop-submarine-yarnservice-runtime-${SUBMARINE_VERSION}.jar:path-to/hadoop-submarine-tony-
-runtime-${SUBMARINE_VERSION}.jar \
+SUBMARINE_VERSION=0.3.0-SNAPSHOT
+CLASSPATH=`${HADOOP_HOME}/bin/hadoop classpath 
--glob`:${SUBMARINE_HOME}/submarine-all-${SUBMARINE_VERSION}.jar:
+${SUBMARINE_HOME}/conf: \
 java org.apache.submarine.client.cli.Cli job run \
    --name tf-job-001 --verbose --docker_image <image> \
    --input_path hdfs://default/dataset/cifar-10-data \
@@ -80,10 +89,9 @@ Explanations:
 ### Run distributed training
 
 ```
-SUBMARINE_VERSION=0.2.0
-CLASSPATH=`path-to/hadoop classpath 
--glob`:path-to/hadoop-submarine-core-${SUBMARINE_VERSION}.jar:
-path-to/hadoop-submarine-yarnservice-runtime-${SUBMARINE_VERSION}.jar:path-to/hadoop-submarine-tony-
-runtime-${SUBMARINE_VERSION}.jar \
+SUBMARINE_VERSION=0.3.0-SNAPSHOT
+CLASSPATH=`${HADOOP_HOME}/bin/hadoop classpath 
--glob`:${SUBMARINE_HOME}/submarine-all-${SUBMARINE_VERSION}.jar:
+${SUBMARINE_HOME}/conf: \
 java org.apache.submarine.client.cli.Cli job run \
    --name tf-job-001 --verbose --docker_image tf-1.13.1-gpu:0.0.1 \
    --input_path hdfs://default/dataset/cifar-10-data \
@@ -182,8 +190,9 @@ When using YARN native service runtime, you can view 
multiple job training histo
 ```shell
 # Cleanup previous tensorboard service if needed
 
-SUBMARINE_VERSION=0.2.0
-CLASSPATH=`path-to/hadoop classpath 
--glob`:path-to/hadoop-submarine-core-${SUBMARINE_VERSION}.jar:path-to/hadoop-submarine-yarnservice-runtime-${SUBMARINE_VERSION}.jar:path-to/hadoop-submarine-tony-runtime-${SUBMARINE_VERSION}.jar
 \
+SUBMARINE_VERSION=0.3.0-SNAPSHOT
+CLASSPATH=`${HADOOP_HOME}/bin/hadoop classpath 
--glob`:${SUBMARINE_HOME}/submarine-all-${SUBMARINE_VERSION}.jar:
+${SUBMARINE_HOME}/conf: \
 java org.apache.submarine.client.cli.Cli job run \
   --name tensorboard-service \
   --verbose \
diff --git a/docs/helper/RunningSingleNodeCifar10PTJobs.md 
b/docs/helper/RunningSingleNodeCifar10PTJobsWithYarnService.md
similarity index 78%
rename from docs/helper/RunningSingleNodeCifar10PTJobs.md
rename to docs/helper/RunningSingleNodeCifar10PTJobsWithYarnService.md
index 293b288..6e46cf1 100644
--- a/docs/helper/RunningSingleNodeCifar10PTJobs.md
+++ b/docs/helper/RunningSingleNodeCifar10PTJobsWithYarnService.md
@@ -35,11 +35,23 @@ Refer to [Write Dockerfile](WriteDockerfilePT.md) to build 
a Docker image or use
 
 ## Running PyTorch jobs
 
+Set submarine.runtime.class to YarnServiceRuntimeFactory in submarine-site.xml.
+```
+<property>
+    <name>submarine.runtime.class</name>
+    
<value>org.apache.submarine.server.submitter.yarnservice.YarnServiceRuntimeFactory</value>
+    <description>RuntimeFactory for Submarine jobs</description>
+  </property>
+```
+The file, named submarine-site.xml, is in the path of ${SUBMARINE_HOME}/conf.
+
 ### Run standalone training
 
 ```shell
-export 
HADOOP_CLASSPATH="/home/systest/hadoop-submarine-score-yarnservice-runtime-0.2.0-SNAPSHOT.jar:/home/systest/hadoop-submarine-core-0.2.0-SNAPSHOT.jar"
-/opt/hadoop/bin/yarn jar 
/home/systest/hadoop-submarine-core-0.2.0-SNAPSHOT.jar job run \
+SUBMARINE_VERSION=0.3.0-SNAPSHOT
+CLASSPATH=`${HADOOP_HOME}/bin/hadoop classpath 
--glob`:${SUBMARINE_HOME}/submarine-all-${SUBMARINE_VERSION}.jar:
+${SUBMARINE_HOME}/conf: \
+java org.apache.submarine.client.cli.Cli job run \
 --name pytorch-job-001 \
 --verbose \
 --framework pytorch \
diff --git 
a/submarine-commons/commons-runtime/src/main/docker/pytorch/base/ubuntu-16.04/Dockerfile.gpu.pytorch_latest
 b/docs/helper/docker/pytorch/base/ubuntu-16.04/Dockerfile.gpu.pytorch_latest
similarity index 100%
rename from 
submarine-commons/commons-runtime/src/main/docker/pytorch/base/ubuntu-16.04/Dockerfile.gpu.pytorch_latest
rename to 
docs/helper/docker/pytorch/base/ubuntu-16.04/Dockerfile.gpu.pytorch_latest
diff --git 
a/submarine-commons/commons-runtime/src/main/docker/pytorch/build-all.sh 
b/docs/helper/docker/pytorch/build-all.sh
similarity index 100%
rename from 
submarine-commons/commons-runtime/src/main/docker/pytorch/build-all.sh
rename to docs/helper/docker/pytorch/build-all.sh
diff --git 
a/submarine-commons/commons-runtime/src/main/docker/pytorch/with-cifar10-models/cifar10_tutorial.py
 b/docs/helper/docker/pytorch/with-cifar10-models/cifar10_tutorial.py
similarity index 100%
rename from 
submarine-commons/commons-runtime/src/main/docker/pytorch/with-cifar10-models/cifar10_tutorial.py
rename to docs/helper/docker/pytorch/with-cifar10-models/cifar10_tutorial.py
diff --git 
a/submarine-commons/commons-runtime/src/main/docker/pytorch/with-cifar10-models/ubuntu-16.04/Dockerfile.gpu.pytorch_latest
 
b/docs/helper/docker/pytorch/with-cifar10-models/ubuntu-16.04/Dockerfile.gpu.pytorch_latest
similarity index 100%
rename from 
submarine-commons/commons-runtime/src/main/docker/pytorch/with-cifar10-models/ubuntu-16.04/Dockerfile.gpu.pytorch_latest
rename to 
docs/helper/docker/pytorch/with-cifar10-models/ubuntu-16.04/Dockerfile.gpu.pytorch_latest
diff --git 
a/submarine-commons/commons-runtime/src/main/docker/tensorflow/base/ubuntu-16.04/Dockerfile.cpu.tf_1.13.1
 b/docs/helper/docker/tensorflow/base/ubuntu-16.04/Dockerfile.cpu.tf_1.13.1
similarity index 100%
rename from 
submarine-commons/commons-runtime/src/main/docker/tensorflow/base/ubuntu-16.04/Dockerfile.cpu.tf_1.13.1
rename to 
docs/helper/docker/tensorflow/base/ubuntu-16.04/Dockerfile.cpu.tf_1.13.1
diff --git 
a/submarine-commons/commons-runtime/src/main/docker/tensorflow/base/ubuntu-16.04/Dockerfile.gpu.tf_1.13.1
 b/docs/helper/docker/tensorflow/base/ubuntu-16.04/Dockerfile.gpu.tf_1.13.1
similarity index 100%
rename from 
submarine-commons/commons-runtime/src/main/docker/tensorflow/base/ubuntu-16.04/Dockerfile.gpu.tf_1.13.1
rename to 
docs/helper/docker/tensorflow/base/ubuntu-16.04/Dockerfile.gpu.tf_1.13.1
diff --git 
a/submarine-commons/commons-runtime/src/main/docker/tensorflow/build-all.sh 
b/docs/helper/docker/tensorflow/build-all.sh
similarity index 100%
rename from 
submarine-commons/commons-runtime/src/main/docker/tensorflow/build-all.sh
rename to docs/helper/docker/tensorflow/build-all.sh
diff --git 
a/submarine-commons/commons-runtime/src/main/docker/tensorflow/with-cifar10-models/ubuntu-16.04/Dockerfile.cpu.tf_1.13.1
 
b/docs/helper/docker/tensorflow/with-cifar10-models/ubuntu-16.04/Dockerfile.cpu.tf_1.13.1
similarity index 100%
rename from 
submarine-commons/commons-runtime/src/main/docker/tensorflow/with-cifar10-models/ubuntu-16.04/Dockerfile.cpu.tf_1.13.1
rename to 
docs/helper/docker/tensorflow/with-cifar10-models/ubuntu-16.04/Dockerfile.cpu.tf_1.13.1
diff --git 
a/submarine-commons/commons-runtime/src/main/docker/tensorflow/with-cifar10-models/ubuntu-16.04/Dockerfile.gpu.tf_1.13.1
 
b/docs/helper/docker/tensorflow/with-cifar10-models/ubuntu-16.04/Dockerfile.gpu.tf_1.13.1
similarity index 100%
rename from 
submarine-commons/commons-runtime/src/main/docker/tensorflow/with-cifar10-models/ubuntu-16.04/Dockerfile.gpu.tf_1.13.1
rename to 
docs/helper/docker/tensorflow/with-cifar10-models/ubuntu-16.04/Dockerfile.gpu.tf_1.13.1
diff --git 
a/submarine-commons/commons-runtime/src/main/docker/tensorflow/with-cifar10-models/ubuntu-16.04/cifar10_estimator_tf_1.13.1/README.md
 
b/docs/helper/docker/tensorflow/with-cifar10-models/ubuntu-16.04/cifar10_estimator_tf_1.13.1/README.md
similarity index 100%
rename from 
submarine-commons/commons-runtime/src/main/docker/tensorflow/with-cifar10-models/ubuntu-16.04/cifar10_estimator_tf_1.13.1/README.md
rename to 
docs/helper/docker/tensorflow/with-cifar10-models/ubuntu-16.04/cifar10_estimator_tf_1.13.1/README.md
diff --git 
a/submarine-commons/commons-runtime/src/main/docker/tensorflow/with-cifar10-models/ubuntu-16.04/cifar10_estimator_tf_1.13.1/cifar10.py
 
b/docs/helper/docker/tensorflow/with-cifar10-models/ubuntu-16.04/cifar10_estimator_tf_1.13.1/cifar10.py
similarity index 100%
rename from 
submarine-commons/commons-runtime/src/main/docker/tensorflow/with-cifar10-models/ubuntu-16.04/cifar10_estimator_tf_1.13.1/cifar10.py
rename to 
docs/helper/docker/tensorflow/with-cifar10-models/ubuntu-16.04/cifar10_estimator_tf_1.13.1/cifar10.py
diff --git 
a/submarine-commons/commons-runtime/src/main/docker/tensorflow/with-cifar10-models/ubuntu-16.04/cifar10_estimator_tf_1.13.1/cifar10_main.py
 
b/docs/helper/docker/tensorflow/with-cifar10-models/ubuntu-16.04/cifar10_estimator_tf_1.13.1/cifar10_main.py
similarity index 100%
rename from 
submarine-commons/commons-runtime/src/main/docker/tensorflow/with-cifar10-models/ubuntu-16.04/cifar10_estimator_tf_1.13.1/cifar10_main.py
rename to 
docs/helper/docker/tensorflow/with-cifar10-models/ubuntu-16.04/cifar10_estimator_tf_1.13.1/cifar10_main.py
diff --git 
a/submarine-commons/commons-runtime/src/main/docker/tensorflow/with-cifar10-models/ubuntu-16.04/cifar10_estimator_tf_1.13.1/cifar10_model.py
 
b/docs/helper/docker/tensorflow/with-cifar10-models/ubuntu-16.04/cifar10_estimator_tf_1.13.1/cifar10_model.py
similarity index 100%
rename from 
submarine-commons/commons-runtime/src/main/docker/tensorflow/with-cifar10-models/ubuntu-16.04/cifar10_estimator_tf_1.13.1/cifar10_model.py
rename to 
docs/helper/docker/tensorflow/with-cifar10-models/ubuntu-16.04/cifar10_estimator_tf_1.13.1/cifar10_model.py
diff --git 
a/submarine-commons/commons-runtime/src/main/docker/tensorflow/with-cifar10-models/ubuntu-16.04/cifar10_estimator_tf_1.13.1/cifar10_utils.py
 
b/docs/helper/docker/tensorflow/with-cifar10-models/ubuntu-16.04/cifar10_estimator_tf_1.13.1/cifar10_utils.py
similarity index 100%
rename from 
submarine-commons/commons-runtime/src/main/docker/tensorflow/with-cifar10-models/ubuntu-16.04/cifar10_estimator_tf_1.13.1/cifar10_utils.py
rename to 
docs/helper/docker/tensorflow/with-cifar10-models/ubuntu-16.04/cifar10_estimator_tf_1.13.1/cifar10_utils.py
diff --git 
a/submarine-commons/commons-runtime/src/main/docker/tensorflow/with-cifar10-models/ubuntu-16.04/cifar10_estimator_tf_1.13.1/generate_cifar10_tfrecords.py
 
b/docs/helper/docker/tensorflow/with-cifar10-models/ubuntu-16.04/cifar10_estimator_tf_1.13.1/generate_cifar10_tfrecords.py
similarity index 100%
rename from 
submarine-commons/commons-runtime/src/main/docker/tensorflow/with-cifar10-models/ubuntu-16.04/cifar10_estimator_tf_1.13.1/generate_cifar10_tfrecords.py
rename to 
docs/helper/docker/tensorflow/with-cifar10-models/ubuntu-16.04/cifar10_estimator_tf_1.13.1/generate_cifar10_tfrecords.py
diff --git 
a/submarine-commons/commons-runtime/src/main/docker/tensorflow/with-cifar10-models/ubuntu-16.04/cifar10_estimator_tf_1.13.1/model_base.py
 
b/docs/helper/docker/tensorflow/with-cifar10-models/ubuntu-16.04/cifar10_estimator_tf_1.13.1/model_base.py
similarity index 100%
rename from 
submarine-commons/commons-runtime/src/main/docker/tensorflow/with-cifar10-models/ubuntu-16.04/cifar10_estimator_tf_1.13.1/model_base.py
rename to 
docs/helper/docker/tensorflow/with-cifar10-models/ubuntu-16.04/cifar10_estimator_tf_1.13.1/model_base.py
diff --git 
a/submarine-commons/commons-runtime/src/main/docker/tensorflow/zeppelin-notebook-example/Dockerfile.gpu
 b/docs/helper/docker/tensorflow/zeppelin-notebook-example/Dockerfile.gpu
similarity index 100%
rename from 
submarine-commons/commons-runtime/src/main/docker/tensorflow/zeppelin-notebook-example/Dockerfile.gpu
rename to docs/helper/docker/tensorflow/zeppelin-notebook-example/Dockerfile.gpu
diff --git 
a/submarine-commons/commons-runtime/src/main/docker/tensorflow/zeppelin-notebook-example/run_container.sh
 b/docs/helper/docker/tensorflow/zeppelin-notebook-example/run_container.sh
similarity index 100%
rename from 
submarine-commons/commons-runtime/src/main/docker/tensorflow/zeppelin-notebook-example/run_container.sh
rename to 
docs/helper/docker/tensorflow/zeppelin-notebook-example/run_container.sh
diff --git 
a/submarine-commons/commons-runtime/src/main/docker/tensorflow/zeppelin-notebook-example/shiro.ini
 b/docs/helper/docker/tensorflow/zeppelin-notebook-example/shiro.ini
similarity index 100%
rename from 
submarine-commons/commons-runtime/src/main/docker/tensorflow/zeppelin-notebook-example/shiro.ini
rename to docs/helper/docker/tensorflow/zeppelin-notebook-example/shiro.ini
diff --git 
a/submarine-commons/commons-runtime/src/main/docker/tensorflow/zeppelin-notebook-example/zeppelin-site.xml
 b/docs/helper/docker/tensorflow/zeppelin-notebook-example/zeppelin-site.xml
similarity index 100%
rename from 
submarine-commons/commons-runtime/src/main/docker/tensorflow/zeppelin-notebook-example/zeppelin-site.xml
rename to 
docs/helper/docker/tensorflow/zeppelin-notebook-example/zeppelin-site.xml


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[submarine] branch master updated: SUBMARINE-322. Add yarn runtime examples in installation guide.

Reply via email to