[submarine] branch master updated: SUBMARINE-528. Polish documents for 0.4.0 release

jiwq Wed, 24 Jun 2020 08:55:04 -0700

This is an automated email from the ASF dual-hosted git repository.

jiwq pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/submarine.git



The following commit(s) were added to refs/heads/master by this push:
     new 66c0f7e  SUBMARINE-528. Polish documents for 0.4.0 release
66c0f7e is described below

commit 66c0f7ea5e636e9e0508f605543a6e35e1255c29
Author: Zhankun Tang <[email protected]>
AuthorDate: Wed Jun 24 15:34:41 2020 +0800

    SUBMARINE-528. Polish documents for 0.4.0 release
    
    ### What is this PR for?
    - Polish the main page document
    - Fix issues in the existing document
    
    ### What type of PR is it?
    Documentation
    
    ### What is the Jira issue?
    https://issues.apache.org/jira/browse/SUBMARINE-528
    
    ### How should this be tested?
    N/A
    
    ### Questions:
    * Does the licenses files need update? No
    * Is there breaking changes for older versions? No
    * Does this needs documentation? Yes
    
    Author: Zhankun Tang <[email protected]>
    Author: tangzhankun <[email protected]>
    
    Closes #316 from tangzhankun/SUBMARINE-528 and squashes the following 
commits:
    
    d627178 [Zhankun Tang] update jupyter notebook example
    0ae9d1a [Zhankun Tang] update jupyter notebook example
    fe5b9ca [Zhankun Tang] refinement based on review comments
    5a3ee15 [Zhankun Tang] refinement based on review comments
    beb4d92 [Zhankun Tang] refinement based on review comments
    c8b508d [Zhankun Tang] add quick start of submarine k8s in main page doc
    ad53364 [Zhankun Tang] update get_log in notebook
    70345d7 [Zhankun Tang] Update jupyter notebook with new wait_for_finish API
    c6588f9 [Zhankun Tang] Add SDK installtion guide to user doc
    048c78e [tangzhankun] Update README.md
    4c08117 [Zhankun Tang] Add ping to verify port forwarding is working in doc
    07b883a [Zhankun Tang] Polish the main page and the helm charts document
---
 .gitignore                                         |   3 +
 README.md                                          |  63 ++-
 docs/userdocs/k8s/README.md                        |  42 ++
 docs/userdocs/k8s/helm.md                          |  87 +++-
 docs/userdocs/k8s/run-pytorch-experiment.md        |   2 +-
 docs/userdocs/k8s/run-tensorflow-experiment.md     |  18 +-
 .../example/submarine_experiment_sdk.ipynb         | 493 +++++++++++++--------
 7 files changed, 509 insertions(+), 199 deletions(-)

diff --git a/.gitignore b/.gitignore
index c4c7f50..996243c 100644
--- a/.gitignore
+++ b/.gitignore
@@ -95,3 +95,6 @@ submarine-security/spark-security/derby.log
 .classpath
 .settings
 .factorypath
+
+# jupyter notebook checkpoints
+.ipynb_checkpoints
diff --git a/README.md b/README.md
index 7580f09..c7018c6 100644
--- a/README.md
+++ b/README.md
@@ -64,11 +64,68 @@ _Theodore Levitt_ once said:
 
 Like mentioned above, Submarine is targeted to bring Data-Scientist-friendly 
user-interfaces to make their life easier. Here're some examples of Submarine 
user-interfaces.
 
-<FIXME: Add/FIX more contents below>
+### Submit a distributed Tensorflow experiment via Submarine Python SDK
 
-<WIP>
+#### Run a Tensorflow Mnist experiment
+```python
+
+# New a submarine client of the submarine server
+submarine_client = submarine.ExperimentClient(host='http://localhost:8080')
+
+# The experiment's environment, could be Docker image or Conda environment 
based
+environment = Environment(image='gcr.io/kubeflow-ci/tf-dist-mnist-test:1.0')
+
+# Specify the experiment's name, framework it's using, namespace it will run 
in,
+# the entry point. It can also accept environment variables. etc.
+# For PyTorch job, the framework should be 'Pytorch'.
+experiment_meta = ExperimentMeta(name='mnist-dist',
+                                 namespace='default',
+                                 framework='Tensorflow',
+                                 cmd='python /var/tf_dist_mnist/dist_mnist.py 
--train_steps=100')
+# 1 PS task of 2 cpu, 1GB
+ps_spec = ExperimentTaskSpec(resources='cpu=2,memory=1024M',
+                             replicas=1)
+# 1 Worker task
+worker_spec = ExperimentTaskSpec(resources='cpu=2,memory=1024M',
+                                 replicas=1)
+
+# Wrap up the meta, environment and task specs into an experiment.
+# For PyTorch job, the specs would be "Master" and "Worker".
+experiment_spec = ExperimentSpec(meta=experiment_meta,
+                                 environment=environment,
+                                 spec={'Ps':ps_spec, 'Worker': worker_spec})
+
+# Submit the experiment to submarine server
+experiment = 
submarine_client.create_experiment(experiment_spec=experiment_spec)
+
+# Get the experiment ID
+id = experiment['experimentId']
+
+```
+
+#### Query a specific experiment
+```python
+submarine_client.get_experiment(id)
+```
+
+#### Wait for finish
+
+```python
+submarine_client.wait_for_finish(id)
+```
+
+#### Get the experiment's log
+```python
+submarine_client.get_log(id)
+```
+
+#### Get all running experiment
+```python
+submarine_client.list_experiments(status='running')
+```
+
+For a quick-start, see [Submarine On K8s](docs/userdocs/k8s/README.md)
 
-### Submit a distributed Tensorflow experiment via Submarine Python SDK
 
 ### Submit a pre-defined experiment template job
 
diff --git a/docs/userdocs/k8s/README.md b/docs/userdocs/k8s/README.md
index 783efaa..4733a91 100644
--- a/docs/userdocs/k8s/README.md
+++ b/docs/userdocs/k8s/README.md
@@ -31,6 +31,48 @@ After you have an up-and-running K8s, you can follow 
[Submarine Helm Charts Guid
 ## Use Submarine
 
 ### Model training (experiment) on K8s
+
+#### Prepare Python Environment to run Submarine SDK
+
+Submarine SDK assumes Python3.7+ is ready.
+It's better to use a new Python environment created by `Anoconda` or Python 
`virtualenv` to try this to avoid trouble to existing Python environment.
+A sample Python virtual env can be setup like this:
+```bash
+wget 
https://files.pythonhosted.org/packages/33/bc/fa0b5347139cd9564f0d44ebd2b147ac97c36b2403943dbee8a25fd74012/virtualenv-16.0.0.tar.gz
+tar xf virtualenv-16.0.0.tar.gz
+
+# Make sure to install using Python 3
+python3 virtualenv-16.0.0/virtualenv.py venv
+. venv/bin/activate
+```
+
+#### With Submarine SDK (Recommended)
+
+- Install SDK from pypi.org
+
+Starting from 0.4.0, Submarine provides Python SDK. Please change it to a 
proper version needed.
+
+```bash
+pip install submarine-sdk==0.4.0
+```
+
+- Install SDK from source code
+
+Please first clone code from github or go to 
`http://submarine.apache.org/download.html` to download released source code.
+```bash
+git clone https://github.com/apache/submarine.git
+git checkout <correct release tag/branch>
+cd submarine/submarine-sdk/pysubmarine
+pip install .
+```
+
+- Run with Submarine Python SDK
+
+Assuming you've installed submarine on K8s and forward the service to 
localhost, now you can open a Python shell, Jupyter notebook or any tools with 
Submarine SDK installed.
+
+Follow [SDK experiment 
example](../../../submarine-sdk/pysubmarine/example/submarine_experiment_sdk.ipynb)
 to try the SDK.
+
+#### With REST API
 - [Run model training using Tensorflow](run-tensorflow-experiment.md)
 - [Run model training using PyTorch](run-pytorch-experiment.md)
 - [Experiment API Reference](api/experiment.md)
diff --git a/docs/userdocs/k8s/helm.md b/docs/userdocs/k8s/helm.md
index 18d3112..28997ff 100644
--- a/docs/userdocs/k8s/helm.md
+++ b/docs/userdocs/k8s/helm.md
@@ -20,31 +20,94 @@ under the License.
 
 # Deploy Submarine On K8s
 
-## Deploy Submarine using Helm Chart (Recommended)
+## Deploy Submarine Using Helm Chart (Recommended)
 
 Submarine's Helm Chart will not only deploy Submarine Server, but also deploys 
TF Operator / PyTorch Operator (which will be used by Submarine Server to run 
TF/PyTorch jobs on K8s).
 
-### Create images
-submarine server
+
+### Install Helm
+
+Helm v3 is minimum requirement.
+See here for installation: https://helm.sh/docs/intro/install/
+
+### Install Submarine
+
+The Submarine helm charts is released with the source code for now.
+Please go to `http://submarine.apache.org/download.html` to download
+
+- Install Helm charts from source code
 ```bash
-./dev-support/docker-images/submarine/build.sh
+cd <PathTo>/submarine
+helm install submarine ./helm-charts/submarine
 ```
+This will install submarine in the "default" namespace.
+The images are from Docker hub `apache/submarine`. See 
`./helm-charts/submarine/values.yaml` for more details
 
-submarine database
+If we'd like use a different namespace like "submarine"
 ```bash
-./dev-support/docker-images/database/build.sh
+kubectl create namespace submarine
+helm install submarine ./helm-charts/submarine -n submarine
 ```
 
-### install helm
-For more info see https://helm.sh/docs/intro/install/
+> Note that if you encounter below issue when installation:
+```bash
+Error: rendered manifests contain a resource that already exists.
+Unable to continue with install: existing resource conflict: namespace: , 
name: podgroups.scheduling.incubator.k8s.io, existing_kind: 
apiextensions.k8s.io/v1beta1, Kind=CustomResourceDefinition, new_kind: 
apiextensions.k8s.io/v1beta1, Kind=CustomResourceDefinition
+```
+It might be caused by the previous installed submarine charts. Fix it by 
running:
+```bash
+kubectl delete crd/tfjobs.kubeflow.org && kubectl delete 
crd/podgroups.scheduling.incubator.k8s.io && kubectl delete 
crd/pytorchjobs.kubeflow.org
+```
+
+- Verify installation
 
-### Deploy Submarine Server, mysql
-You can modify some settings in ./helm-charts/submarine/values.yaml
+Once you got it installed, check with below commands and you should see 
similar outputs:
 ```bash
-helm install submarine ./helm-charts/submarine
+kubectl get pods
+```
+
+```bash
+NAME                                 READY     STATUS    RESTARTS   AGE
+pytorch-operator-54854bf847-x65nk    1/1       Running   0          5m
+submarine-database-5f74f747d-dzmf6   1/1       Running   0          5m
+submarine-server-6f449bc967-cqkkv    1/1       Running   0          5m
+tf-job-operator-c9cd7ccbd-4dzcs      1/1       Running   0          5m
 ```
 
-### Delete deployment
+### Enable local access to Submarine Server
+Submarine server by default expose 8080 port within K8s cluster.
+To access the server from outside of the cluster, we need to expose the 
service.
+We can either use port-forward, or use K8s `Ingress`, here is an example of 
port-forward.
+
+```bash
+kubectl port-forward svc/submarine-server 8080:8080
+
+# In another terminal. Run below command to verify it works
+curl http://127.0.0.1:8080/api/v1/experiment/ping
+{"status":"OK","code":200,"success":true,"message":null,"result":"Pong","attributes":{}}
+```
+
+### Uninstall Submarine
 ```bash
 helm delete submarine
 ```
+
+### Create Your Custom Submarine Images (Optional)
+Sometimes we'd like to do some modifications on the images.
+After that, you need to rebuild submarine images:
+> Note that you need to make sure the images built above can be accessed in k8s
+> Usually this needs a rename and push to a proper Docker registry.
+
+```bash
+mvn clean package -DskipTests
+```
+
+Build submarine server image:
+```bash
+./dev-support/docker-images/submarine/build.sh
+```
+
+Build submarine database image:
+```bash
+./dev-support/docker-images/database/build.sh
+```
diff --git a/docs/userdocs/k8s/run-pytorch-experiment.md 
b/docs/userdocs/k8s/run-pytorch-experiment.md
index 30debd2..a1b6f9c 100644
--- a/docs/userdocs/k8s/run-pytorch-experiment.md
+++ b/docs/userdocs/k8s/run-pytorch-experiment.md
@@ -101,7 +101,7 @@ curl -X POST -H "Content-Type: application/json" -d '
     }
   }
 }
-' http://127.0.0.1/api/v1/experiment
+' http://127.0.0.1:8080/api/v1/experiment
 ```
 
 **Example Response:**
diff --git a/docs/userdocs/k8s/run-tensorflow-experiment.md 
b/docs/userdocs/k8s/run-tensorflow-experiment.md
index 23459a0..a06ab56 100644
--- a/docs/userdocs/k8s/run-tensorflow-experiment.md
+++ b/docs/userdocs/k8s/run-tensorflow-experiment.md
@@ -36,10 +36,10 @@ environment:
 spec:
   Ps:
     replicas: 1
-    resources: "cpu=1,memory=512M"
+    resources: "cpu=1,memory=1024M"
   Worker:
     replicas: 1
-    resources: "cpu=1,memory=512M"
+    resources: "cpu=1,memory=1024M"
 ```
 
 **JSON Format:**
@@ -60,11 +60,11 @@ spec:
   "spec": {
     "Ps": {
       "replicas": 1,
-      "resources": "cpu=1,memory=512M"
+      "resources": "cpu=1,memory=1024M"
     },
     "Worker": {
       "replicas": 1,
-      "resources": "cpu=1,memory=512M"
+      "resources": "cpu=1,memory=1024M"
     }
   }
 }
@@ -92,15 +92,15 @@ curl -X POST -H "Content-Type: application/json" -d '
   "spec": {
     "Ps": {
       "replicas": 1,
-      "resources": "cpu=1,memory=512M"
+      "resources": "cpu=1,memory=1024M"
     },
     "Worker": {
       "replicas": 1,
-      "resources": "cpu=1,memory=512M"
+      "resources": "cpu=1,memory=1024M"
     }
   }
 }
-' http://127.0.0.1/api/v1/experiment
+' http://127.0.0.1:8080/api/v1/experiment
 ```
 
 **Example Response:**
@@ -130,11 +130,11 @@ curl -X POST -H "Content-Type: application/json" -d '
             "spec": {
                 "Ps": {
                     "replicas": 1,
-                    "resources": "cpu=1,memory=512M"
+                    "resources": "cpu=1,memory=1024M"
                 },
                 "Worker": {
                     "replicas": 1,
-                    "resources": "cpu=1,memory=512M"
+                    "resources": "cpu=1,memory=1024M"
                 }
             }
         }
diff --git a/submarine-sdk/pysubmarine/example/submarine_experiment_sdk.ipynb 
b/submarine-sdk/pysubmarine/example/submarine_experiment_sdk.ipynb
index efd9b87..2083a3f 100644
--- a/submarine-sdk/pysubmarine/example/submarine_experiment_sdk.ipynb
+++ b/submarine-sdk/pysubmarine/example/submarine_experiment_sdk.ipynb
@@ -45,7 +45,7 @@
    },
    "outputs": [],
    "source": [
-    "submarine_client = 
submarine.ExperimentClient(host='http://submarine:8080')"
+    "submarine_client = 
submarine.ExperimentClient(host='http://localhost:8080')"
    ]
   },
   {
@@ -58,7 +58,7 @@
    "source": [
     "### Define TensorFlow experiment spec¶\n",
     "Define Submarine spec¶\n",
-    "The demo only creates a worker of TF experiment to run mnist sample."
+    "The demo only creates a PS and worker of TF experiment to run mnist 
sample."
    ]
   },
   {
@@ -72,22 +72,21 @@
    },
    "outputs": [],
    "source": [
-    "experiment_meta = ExperimentMeta(name='mnist',\n",
-    "                                 namespace='submarine',\n",
+    "environment = 
Environment(image='gcr.io/kubeflow-ci/tf-dist-mnist-test:1.0')\n",
+    "experiment_meta = ExperimentMeta(name='mnist-dist',\n",
+    "                                 namespace='default',\n",
     "                                 framework='Tensorflow',\n",
-    "                                 cmd='python 
/var/tf_mnist/mnist_with_summaries.py'\n",
-    "                                    ' --log_dir=/train/log 
--learning_rate=0.01'\n",
-    "                                    ' --batch_size=150',\n",
-    "                                 env_vars={'ENV1': 'ENV1'})\n",
+    "                                 cmd='python 
/var/tf_dist_mnist/dist_mnist.py --train_steps=100'\n",
+    "                                 , env_vars={'ENV1': 'ENV1'})\n",
     "\n",
-    "worker_spec = ExperimentTaskSpec(resources='cpu=4,memory=2048M',\n",
+    "worker_spec = ExperimentTaskSpec(resources='cpu=1,memory=1024M',\n",
+    "                                 replicas=1)\n",
+    "ps_spec = ExperimentTaskSpec(resources='cpu=1,memory=1024M',\n",
     "                                 replicas=1)\n",
-    "\n",
-    "environment = 
Environment(image='gcr.io/kubeflow-ci/tf-mnist-with-summaries:1.0')\n",
     "\n",
     "experiment_spec = ExperimentSpec(meta=experiment_meta,\n",
     "                                 environment=environment,\n",
-    "                                 spec={'Worker': worker_spec}) \n"
+    "                                 spec={'Ps' : ps_spec,'Worker': 
worker_spec})\n"
    ]
   },
   {
@@ -102,19 +101,48 @@
    "execution_count": 4,
    "metadata": {
     "pycharm": {
-     "name": "#%%\n",
-     "is_executing": false
+     "is_executing": false,
+     "name": "#%%\n"
     },
     "scrolled": true
    },
    "outputs": [
     {
      "data": {
-      "text/plain": "{'experimentId': 'experiment_1592480334465_0001',\n 
'name': 'mnist',\n 'uid': '47f62be1-9d5a-473b-bac2-a7d40c365b45',\n 'status': 
'Accepted',\n 'acceptedTime': '2020-06-18T19:39:54.000+08:00',\n 'createdTime': 
None,\n 'runningTime': None,\n 'finishedTime': None,\n 'spec': {'meta': 
{'name': 'mnist',\n   'namespace': 'submarine',\n   'framework': 
'Tensorflow',\n   'cmd': 'python /var/tf_mnist/mnist_with_summaries.py 
--log_dir=/train/log --learning_rate=0.01 --batch_siz [...]
+      "text/plain": [
+       "{'experimentId': 'experiment_1592969710478_0001',\n",
+       " 'name': 'mnist-dist',\n",
+       " 'uid': '360886c6-b5cc-11ea-b5f2-025000000001',\n",
+       " 'status': 'Accepted',\n",
+       " 'acceptedTime': '2020-06-24T11:38:47.000+08:00',\n",
+       " 'createdTime': None,\n",
+       " 'runningTime': None,\n",
+       " 'finishedTime': None,\n",
+       " 'spec': {'meta': {'name': 'mnist-dist',\n",
+       "   'namespace': 'default',\n",
+       "   'framework': 'Tensorflow',\n",
+       "   'cmd': 'python /var/tf_dist_mnist/dist_mnist.py 
--train_steps=100',\n",
+       "   'envVars': {'ENV1': 'ENV1'}},\n",
+       "  'environment': {'image': 
'gcr.io/kubeflow-ci/tf-dist-mnist-test:1.0'},\n",
+       "  'spec': {'Ps': {'replicas': 1,\n",
+       "    'resources': 'cpu=1,memory=1024M',\n",
+       "    'name': None,\n",
+       "    'image': None,\n",
+       "    'cmd': None,\n",
+       "    'envVars': None,\n",
+       "    'resourceMap': {'memory': '1024M', 'cpu': '1'}},\n",
+       "   'Worker': {'replicas': 1,\n",
+       "    'resources': 'cpu=1,memory=1024M',\n",
+       "    'name': None,\n",
+       "    'image': None,\n",
+       "    'cmd': None,\n",
+       "    'envVars': None,\n",
+       "    'resourceMap': {'memory': '1024M', 'cpu': '1'}}}}}"
+      ]
      },
+     "execution_count": 4,
      "metadata": {},
-     "output_type": "execute_result",
-     "execution_count": 4
+     "output_type": "execute_result"
     }
    ],
    "source": [
@@ -134,18 +162,47 @@
    "execution_count": 5,
    "metadata": {
     "pycharm": {
-     "name": "#%%\n",
-     "is_executing": false
+     "is_executing": false,
+     "name": "#%%\n"
     }
    },
    "outputs": [
     {
      "data": {
-      "text/plain": "{'experimentId': 'experiment_1592480334465_0001',\n 
'name': 'mnist',\n 'uid': '47f62be1-9d5a-473b-bac2-a7d40c365b45',\n 'status': 
'Running',\n 'acceptedTime': '2020-06-18T19:39:54.000+08:00',\n 'createdTime': 
'2020-06-18T19:39:54.000+08:00',\n 'runningTime': 
'2020-06-18T19:39:55.000+08:00',\n 'finishedTime': None,\n 'spec': {'meta': 
{'name': 'mnist',\n   'namespace': 'submarine',\n   'framework': 
'Tensorflow',\n   'cmd': 'python /var/tf_mnist/mnist_with_summaries.py  [...]
+      "text/plain": [
+       "{'experimentId': 'experiment_1592969710478_0001',\n",
+       " 'name': 'mnist-dist',\n",
+       " 'uid': '360886c6-b5cc-11ea-b5f2-025000000001',\n",
+       " 'status': 'Running',\n",
+       " 'acceptedTime': '2020-06-24T11:38:47.000+08:00',\n",
+       " 'createdTime': '2020-06-24T11:38:47.000+08:00',\n",
+       " 'runningTime': '2020-06-24T11:38:49.000+08:00',\n",
+       " 'finishedTime': None,\n",
+       " 'spec': {'meta': {'name': 'mnist-dist',\n",
+       "   'namespace': 'default',\n",
+       "   'framework': 'Tensorflow',\n",
+       "   'cmd': 'python /var/tf_dist_mnist/dist_mnist.py 
--train_steps=100',\n",
+       "   'envVars': {'ENV1': 'ENV1'}},\n",
+       "  'environment': {'image': 
'gcr.io/kubeflow-ci/tf-dist-mnist-test:1.0'},\n",
+       "  'spec': {'Ps': {'replicas': 1,\n",
+       "    'resources': 'cpu=1,memory=1024M',\n",
+       "    'name': None,\n",
+       "    'image': None,\n",
+       "    'cmd': None,\n",
+       "    'envVars': None,\n",
+       "    'resourceMap': {'memory': '1024M', 'cpu': '1'}},\n",
+       "   'Worker': {'replicas': 1,\n",
+       "    'resources': 'cpu=1,memory=1024M',\n",
+       "    'name': None,\n",
+       "    'image': None,\n",
+       "    'cmd': None,\n",
+       "    'envVars': None,\n",
+       "    'resourceMap': {'memory': '1024M', 'cpu': '1'}}}}}"
+      ]
      },
+     "execution_count": 5,
      "metadata": {},
-     "output_type": "execute_result",
-     "execution_count": 5
+     "output_type": "execute_result"
     }
    ],
    "source": [
@@ -165,18 +222,48 @@
    "execution_count": 6,
    "metadata": {
     "pycharm": {
-     "name": "#%%\n",
-     "is_executing": false
-    }
+     "is_executing": false,
+     "name": "#%%\n"
+    },
+    "scrolled": true
    },
    "outputs": [
     {
      "data": {
-      "text/plain": "[{'experimentId': 'experiment_1592480334465_0001',\n  
'name': 'mnist',\n  'uid': '47f62be1-9d5a-473b-bac2-a7d40c365b45',\n  'status': 
'Running',\n  'acceptedTime': '2020-06-18T19:39:54.000+08:00',\n  
'createdTime': '2020-06-18T19:39:54.000+08:00',\n  'runningTime': 
'2020-06-18T19:39:55.000+08:00',\n  'finishedTime': None,\n  'spec': {'meta': 
{'name': 'mnist',\n    'namespace': 'submarine',\n    'framework': 
'Tensorflow',\n    'cmd': 'python /var/tf_mnist/mnist_with_s [...]
+      "text/plain": [
+       "[{'experimentId': 'experiment_1592969710478_0001',\n",
+       "  'name': 'mnist-dist',\n",
+       "  'uid': '360886c6-b5cc-11ea-b5f2-025000000001',\n",
+       "  'status': 'Running',\n",
+       "  'acceptedTime': '2020-06-24T11:38:47.000+08:00',\n",
+       "  'createdTime': '2020-06-24T11:38:47.000+08:00',\n",
+       "  'runningTime': '2020-06-24T11:38:49.000+08:00',\n",
+       "  'finishedTime': None,\n",
+       "  'spec': {'meta': {'name': 'mnist-dist',\n",
+       "    'namespace': 'default',\n",
+       "    'framework': 'Tensorflow',\n",
+       "    'cmd': 'python /var/tf_dist_mnist/dist_mnist.py 
--train_steps=100',\n",
+       "    'envVars': {'ENV1': 'ENV1'}},\n",
+       "   'environment': {'image': 
'gcr.io/kubeflow-ci/tf-dist-mnist-test:1.0'},\n",
+       "   'spec': {'Ps': {'replicas': 1,\n",
+       "     'resources': 'cpu=1,memory=1024M',\n",
+       "     'name': None,\n",
+       "     'image': None,\n",
+       "     'cmd': None,\n",
+       "     'envVars': None,\n",
+       "     'resourceMap': {'memory': '1024M', 'cpu': '1'}},\n",
+       "    'Worker': {'replicas': 1,\n",
+       "     'resources': 'cpu=1,memory=1024M',\n",
+       "     'name': None,\n",
+       "     'image': None,\n",
+       "     'cmd': None,\n",
+       "     'envVars': None,\n",
+       "     'resourceMap': {'memory': '1024M', 'cpu': '1'}}}}}]"
+      ]
      },
+     "execution_count": 6,
      "metadata": {},
-     "output_type": "execute_result",
-     "execution_count": 6
+     "output_type": "execute_result"
     }
    ],
    "source": [
@@ -188,6 +275,37 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "### Wait for the experiment to finish"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {
+    "scrolled": false
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/usr/local/lib/python2.7/dist-packages/h5py/__init__.py:36: 
FutureWarning: Conversion of the second argument of issubdtype from `float` to 
`np.floating` is deprecated. In future, it will be treated as `np.float64 == 
np.dtype(float).type`.\n",
+      "  from ._conv import register_converters as _register_converters\n",
+      "2020-06-24 03:39:55.150301: I 
tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports 
instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 
AVX AVX2 FMA\n",
+      "2020-06-24 03:39:55.154457: I 
tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize 
GrpcChannelCache for job ps -> {0 -> localhost:2222}\n",
+      "2020-06-24 03:39:55.154492: I 
tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize 
GrpcChannelCache for job worker -> {0 -> 
mnist-dist-worker-0.default.svc:2222}\n",
+      "2020-06-24 03:39:55.155476: I 
tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:324] Started server 
with target: grpc://localhost:2222\n"
+     ]
+    }
+   ],
+   "source": [
+    "submarine_client.wait_for_finish(id)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
     "### Get specific experiment training log "
    ]
   },
@@ -196,154 +314,152 @@
    "execution_count": 9,
    "metadata": {
     "pycharm": {
-     "name": "#%%\n",
-     "is_executing": false
-    }
+     "is_executing": false,
+     "name": "#%%\n"
+    },
+    "scrolled": true
    },
    "outputs": [
     {
      "name": "stderr",
+     "output_type": "stream",
      "text": [
-      "The logs of Pod mnist-worker-0:\n\n",
-      "WARNING:tensorflow:From /var/tf_mnist/mnist_with_summaries.py:39: 
read_data_sets (from tensorflow.contrib.learn.python.learn.datasets.mnist) is 
deprecated and will be removed in a future version.\n",
-      "Instructions for updating:\n",
-      "Please use alternatives such as official/mnist/dataset.py from 
tensorflow/models.\n",
-      "WARNING:tensorflow:From 
/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:260:
 maybe_download (from tensorflow.contrib.learn.python.learn.datasets.base) is 
deprecated and will be removed in a future version.\n",
+      "The logs of Pod mnist-dist-worker-0:\n",
+      "\n",
+      "/usr/local/lib/python2.7/dist-packages/h5py/__init__.py:36: 
FutureWarning: Conversion of the second argument of issubdtype from `float` to 
`np.floating` is deprecated. In future, it will be treated as `np.float64 == 
np.dtype(float).type`.\n",
+      "  from ._conv import register_converters as _register_converters\n",
+      "2020-06-24 03:39:55.118374: I 
tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports 
instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 
AVX AVX2 FMA\n",
+      "2020-06-24 03:39:55.148641: I 
tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize 
GrpcChannelCache for job ps -> {0 -> mnist-dist-ps-0.default.svc:2222}\n",
+      "2020-06-24 03:39:55.148726: I 
tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize 
GrpcChannelCache for job worker -> {0 -> localhost:2222}\n",
+      "2020-06-24 03:39:55.150348: I 
tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:324] Started server 
with target: grpc://localhost:2222\n",
+      "WARNING:tensorflow:From /var/tf_dist_mnist/dist_mnist.py:239: __init__ 
(from tensorflow.python.training.supervisor) is deprecated and will be removed 
in a future version.\n",
       "Instructions for updating:\n",
-      "Please write your own downloading logic.\n",
-      "WARNING:tensorflow:From 
/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:252:
 wrapped_fn (from tensorflow.contrib.learn.python.learn.datasets.base) is 
deprecated and will be removed in a future version.\n",
-      "Instructions for updating:\n",
-      "Please use urllib or similar directly.\n",
-      "WARNING:tensorflow:From 
/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:262:
 extract_images (from tensorflow.contrib.learn.python.learn.datasets.mnist) is 
deprecated and will be removed in a future version.\n",
-      "Instructions for updating:\n",
-      "Please use tf.data to implement this functionality.\n",
-      "WARNING:tensorflow:From 
/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:267:
 extract_labels (from tensorflow.contrib.learn.python.learn.datasets.mnist) is 
deprecated and will be removed in a future version.\n",
-      "Instructions for updating:\n",
-      "Please use tf.data to implement this functionality.\n",
-      "WARNING:tensorflow:From 
/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:290:
 __init__ (from tensorflow.contrib.learn.python.learn.datasets.mnist) is 
deprecated and will be removed in a future version.\n",
-      "Instructions for updating:\n",
-      "Please use alternatives such as official/mnist/dataset.py from 
tensorflow/models.\n",
-      "2020-06-18 11:39:58.166584: I 
tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports 
instructions that this TensorFlow binary was not compiled to use: AVX2 FMA\n",
+      "Please switch to tf.train.MonitoredTrainingSession\n",
+      "2020-06-24 03:39:55.880787: I 
tensorflow/core/distributed_runtime/master_session.cc:1017] Start master 
session 23a80a92d64440cc with config: device_filters: \"/job:ps\" 
device_filters: \"/job:worker/task:0\" allow_soft_placement: true\n",
       "Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.\n",
-      "Extracting 
/tmp/tensorflow/mnist/input_data/train-images-idx3-ubyte.gz\n",
+      "Extracting /tmp/mnist-data/train-images-idx3-ubyte.gz\n",
       "Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.\n",
-      "Extracting 
/tmp/tensorflow/mnist/input_data/train-labels-idx1-ubyte.gz\n",
+      "Extracting /tmp/mnist-data/train-labels-idx1-ubyte.gz\n",
       "Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.\n",
-      "Extracting 
/tmp/tensorflow/mnist/input_data/t10k-images-idx3-ubyte.gz\n",
+      "Extracting /tmp/mnist-data/t10k-images-idx3-ubyte.gz\n",
       "Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.\n",
-      "Extracting 
/tmp/tensorflow/mnist/input_data/t10k-labels-idx1-ubyte.gz\n",
-      "Accuracy at step 0: 0.0954\n",
-      "Accuracy at step 10: 0.6885\n",
-      "Accuracy at step 20: 0.8634\n",
-      "Accuracy at step 30: 0.8975\n",
-      "Accuracy at step 40: 0.9174\n",
-      "Accuracy at step 50: 0.9235\n",
-      "Accuracy at step 60: 0.9276\n",
-      "Accuracy at step 70: 0.9297\n",
-      "Accuracy at step 80: 0.9317\n",
-      "Accuracy at step 90: 0.9401\n",
-      "Adding run metadata for 99\n",
-      "Accuracy at step 100: 0.9389\n",
-      "Accuracy at step 110: 0.9426\n",
-      "Accuracy at step 120: 0.9467\n",
-      "Accuracy at step 130: 0.948\n",
-      "Accuracy at step 140: 0.9497\n",
-      "Accuracy at step 150: 0.9514\n",
-      "Accuracy at step 160: 0.9534\n",
-      "Accuracy at step 170: 0.9464\n",
-      "Accuracy at step 180: 0.9494\n",
-      "Accuracy at step 190: 0.9496\n",
-      "Adding run metadata for 199\n",
-      "Accuracy at step 200: 0.9503\n",
-      "Accuracy at step 210: 0.9537\n",
-      "Accuracy at step 220: 0.953\n",
-      "Accuracy at step 230: 0.9521\n",
-      "Accuracy at step 240: 0.959\n",
-      "Accuracy at step 250: 0.9579\n",
-      "Accuracy at step 260: 0.9546\n",
-      "Accuracy at step 270: 0.9518\n",
-      "Accuracy at step 280: 0.958\n",
-      "Accuracy at step 290: 0.9598\n",
-      "Adding run metadata for 299\n",
-      "Accuracy at step 300: 0.9578\n",
-      "Accuracy at step 310: 0.9622\n",
-      "Accuracy at step 320: 0.9654\n",
-      "Accuracy at step 330: 0.9588\n",
-      "Accuracy at step 340: 0.9605\n",
-      "Accuracy at step 350: 0.9626\n",
-      "Accuracy at step 360: 0.9561\n",
-      "Accuracy at step 370: 0.9647\n",
-      "Accuracy at step 380: 0.9659\n",
-      "Accuracy at step 390: 0.9618\n",
-      "Adding run metadata for 399\n",
-      "Accuracy at step 400: 0.9621\n",
-      "Accuracy at step 410: 0.9642\n",
-      "Accuracy at step 420: 0.9652\n",
-      "Accuracy at step 430: 0.9647\n",
-      "Accuracy at step 440: 0.9642\n",
-      "Accuracy at step 450: 0.9648\n",
-      "Accuracy at step 460: 0.9652\n",
-      "Accuracy at step 470: 0.965\n",
-      "Accuracy at step 480: 0.9659\n",
-      "Accuracy at step 490: 0.9615\n",
-      "Adding run metadata for 499\n",
-      "Accuracy at step 500: 0.9599\n",
-      "Accuracy at step 510: 0.9648\n",
-      "Accuracy at step 520: 0.9654\n",
-      "Accuracy at step 530: 0.9576\n",
-      "Accuracy at step 540: 0.9664\n",
-      "Accuracy at step 550: 0.9661\n",
-      "Accuracy at step 560: 0.9694\n",
-      "Accuracy at step 570: 0.97\n",
-      "Accuracy at step 580: 0.9668\n",
-      "Accuracy at step 590: 0.9667\n",
-      "Adding run metadata for 599\n",
-      "Accuracy at step 600: 0.9675\n",
-      "Accuracy at step 610: 0.9685\n",
-      "Accuracy at step 620: 0.9692\n",
-      "Accuracy at step 630: 0.9695\n",
-      "Accuracy at step 640: 0.9657\n",
-      "Accuracy at step 650: 0.9648\n",
-      "Accuracy at step 660: 0.9707\n",
-      "Accuracy at step 670: 0.9689\n",
-      "Accuracy at step 680: 0.9716\n",
-      "Accuracy at step 690: 0.9698\n",
-      "Adding run metadata for 699\n",
-      "Accuracy at step 700: 0.9667\n",
-      "Accuracy at step 710: 0.9632\n",
-      "Accuracy at step 720: 0.9678\n",
-      "Accuracy at step 730: 0.9664\n",
-      "Accuracy at step 740: 0.9688\n",
-      "Accuracy at step 750: 0.9662\n",
-      "Accuracy at step 760: 0.9705\n",
-      "Accuracy at step 770: 0.9686\n",
-      "Accuracy at step 780: 0.9692\n",
-      "Accuracy at step 790: 0.9662\n",
-      "Adding run metadata for 799\n",
-      "Accuracy at step 800: 0.9619\n",
-      "Accuracy at step 810: 0.9667\n",
-      "Accuracy at step 820: 0.968\n",
-      "Accuracy at step 830: 0.9688\n",
-      "Accuracy at step 840: 0.9707\n",
-      "Accuracy at step 850: 0.9726\n",
-      "Accuracy at step 860: 0.9716\n",
-      "Accuracy at step 870: 0.9708\n",
-      "Accuracy at step 880: 0.9707\n",
-      "Accuracy at step 890: 0.9672\n",
-      "Adding run metadata for 899\n",
-      "Accuracy at step 900: 0.9601\n",
-      "Accuracy at step 910: 0.9664\n",
-      "Accuracy at step 920: 0.9665\n",
-      "Accuracy at step 930: 0.9695\n",
-      "Accuracy at step 940: 0.969\n",
-      "Accuracy at step 950: 0.9677\n",
-      "Accuracy at step 960: 0.9688\n",
-      "Accuracy at step 970: 0.9576\n",
-      "Accuracy at step 980: 0.9547\n",
-      "Accuracy at step 990: 0.9679\n",
-      "Adding run metadata for 999\n"
-     ],
-     "output_type": "stream"
+      "Extracting /tmp/mnist-data/t10k-labels-idx1-ubyte.gz\n",
+      "job name = worker\n",
+      "task index = 0\n",
+      "Worker 0: Initializing session...\n",
+      "Worker 0: Session initialization complete.\n",
+      "Training begins @ 1592969996.537955\n",
+      "1592969997.322857: Worker 0: training step 1 done (global step: 0)\n",
+      "1592969997.333140: Worker 0: training step 2 done (global step: 1)\n",
+      "1592969997.342255: Worker 0: training step 3 done (global step: 2)\n",
+      "1592969997.350622: Worker 0: training step 4 done (global step: 3)\n",
+      "1592969997.358247: Worker 0: training step 5 done (global step: 4)\n",
+      "1592969997.365204: Worker 0: training step 6 done (global step: 5)\n",
+      "1592969997.376976: Worker 0: training step 7 done (global step: 6)\n",
+      "1592969997.383788: Worker 0: training step 8 done (global step: 7)\n",
+      "1592969997.389909: Worker 0: training step 9 done (global step: 8)\n",
+      "1592969997.399034: Worker 0: training step 10 done (global step: 9)\n",
+      "1592969997.406169: Worker 0: training step 11 done (global step: 10)\n",
+      "1592969997.413243: Worker 0: training step 12 done (global step: 11)\n",
+      "1592969997.419582: Worker 0: training step 13 done (global step: 12)\n",
+      "1592969997.426087: Worker 0: training step 14 done (global step: 13)\n",
+      "1592969997.432481: Worker 0: training step 15 done (global step: 14)\n",
+      "1592969997.438895: Worker 0: training step 16 done (global step: 15)\n",
+      "1592969997.445008: Worker 0: training step 17 done (global step: 16)\n",
+      "1592969997.451046: Worker 0: training step 18 done (global step: 17)\n",
+      "1592969997.458387: Worker 0: training step 19 done (global step: 18)\n",
+      "1592969997.464300: Worker 0: training step 20 done (global step: 19)\n",
+      "1592969997.470169: Worker 0: training step 21 done (global step: 20)\n",
+      "1592969997.492154: Worker 0: training step 22 done (global step: 21)\n",
+      "1592969997.500725: Worker 0: training step 23 done (global step: 22)\n",
+      "1592969997.510641: Worker 0: training step 24 done (global step: 23)\n",
+      "1592969997.519666: Worker 0: training step 25 done (global step: 24)\n",
+      "1592969997.527392: Worker 0: training step 26 done (global step: 25)\n",
+      "1592969997.535852: Worker 0: training step 27 done (global step: 26)\n",
+      "1592969997.544154: Worker 0: training step 28 done (global step: 27)\n",
+      "1592969997.550987: Worker 0: training step 29 done (global step: 28)\n",
+      "1592969997.558344: Worker 0: training step 30 done (global step: 29)\n",
+      "1592969997.564822: Worker 0: training step 31 done (global step: 30)\n",
+      "1592969997.571622: Worker 0: training step 32 done (global step: 31)\n",
+      "1592969997.578554: Worker 0: training step 33 done (global step: 32)\n",
+      "1592969997.595638: Worker 0: training step 34 done (global step: 33)\n",
+      "1592969997.603068: Worker 0: training step 35 done (global step: 34)\n",
+      "1592969997.611962: Worker 0: training step 36 done (global step: 35)\n",
+      "1592969997.618786: Worker 0: training step 37 done (global step: 36)\n",
+      "1592969997.625508: Worker 0: training step 38 done (global step: 37)\n",
+      "1592969997.634181: Worker 0: training step 39 done (global step: 38)\n",
+      "1592969997.642113: Worker 0: training step 40 done (global step: 39)\n",
+      "1592969997.649647: Worker 0: training step 41 done (global step: 40)\n",
+      "1592969997.656734: Worker 0: training step 42 done (global step: 41)\n",
+      "1592969997.665110: Worker 0: training step 43 done (global step: 42)\n",
+      "1592969997.673620: Worker 0: training step 44 done (global step: 43)\n",
+      "1592969997.693670: Worker 0: training step 45 done (global step: 44)\n",
+      "1592969997.700257: Worker 0: training step 46 done (global step: 45)\n",
+      "1592969997.705834: Worker 0: training step 47 done (global step: 46)\n",
+      "1592969997.714062: Worker 0: training step 48 done (global step: 47)\n",
+      "1592969997.720700: Worker 0: training step 49 done (global step: 48)\n",
+      "1592969997.746550: Worker 0: training step 50 done (global step: 49)\n",
+      "1592969997.755566: Worker 0: training step 51 done (global step: 50)\n",
+      "1592969997.768644: Worker 0: training step 52 done (global step: 51)\n",
+      "1592969997.775591: Worker 0: training step 53 done (global step: 52)\n",
+      "1592969997.782266: Worker 0: training step 54 done (global step: 53)\n",
+      "1592969997.789567: Worker 0: training step 55 done (global step: 54)\n",
+      "1592969997.796607: Worker 0: training step 56 done (global step: 55)\n",
+      "1592969997.804746: Worker 0: training step 57 done (global step: 56)\n",
+      "1592969997.811790: Worker 0: training step 58 done (global step: 57)\n",
+      "1592969997.820524: Worker 0: training step 59 done (global step: 58)\n",
+      "1592969997.828779: Worker 0: training step 60 done (global step: 59)\n",
+      "1592969997.837011: Worker 0: training step 61 done (global step: 60)\n",
+      "1592969997.844103: Worker 0: training step 62 done (global step: 61)\n",
+      "1592969997.850421: Worker 0: training step 63 done (global step: 62)\n",
+      "1592969997.857403: Worker 0: training step 64 done (global step: 63)\n",
+      "1592969997.863736: Worker 0: training step 65 done (global step: 64)\n",
+      "1592969997.893540: Worker 0: training step 66 done (global step: 65)\n",
+      "1592969997.901177: Worker 0: training step 67 done (global step: 66)\n",
+      "1592969997.907805: Worker 0: training step 68 done (global step: 67)\n",
+      "1592969997.916197: Worker 0: training step 69 done (global step: 68)\n",
+      "1592969997.924106: Worker 0: training step 70 done (global step: 69)\n",
+      "1592969997.946289: Worker 0: training step 71 done (global step: 70)\n",
+      "1592969997.953352: Worker 0: training step 72 done (global step: 71)\n",
+      "1592969997.959779: Worker 0: training step 73 done (global step: 72)\n",
+      "1592969997.966829: Worker 0: training step 74 done (global step: 73)\n",
+      "1592969997.975579: Worker 0: training step 75 done (global step: 74)\n",
+      "1592969997.981944: Worker 0: training step 76 done (global step: 75)\n",
+      "1592969997.992360: Worker 0: training step 77 done (global step: 76)\n",
+      "1592969997.998984: Worker 0: training step 78 done (global step: 77)\n",
+      "1592969998.005780: Worker 0: training step 79 done (global step: 78)\n",
+      "1592969998.019416: Worker 0: training step 80 done (global step: 79)\n",
+      "1592969998.026951: Worker 0: training step 81 done (global step: 80)\n",
+      "1592969998.033177: Worker 0: training step 82 done (global step: 81)\n",
+      "1592969998.040482: Worker 0: training step 83 done (global step: 82)\n",
+      "1592969998.047058: Worker 0: training step 84 done (global step: 83)\n",
+      "1592969998.053640: Worker 0: training step 85 done (global step: 84)\n",
+      "1592969998.060095: Worker 0: training step 86 done (global step: 85)\n",
+      "1592969998.066217: Worker 0: training step 87 done (global step: 86)\n",
+      "1592969998.071884: Worker 0: training step 88 done (global step: 87)\n",
+      "1592969998.078604: Worker 0: training step 89 done (global step: 88)\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "1592969998.099135: Worker 0: training step 90 done (global step: 89)\n",
+      "1592969998.105798: Worker 0: training step 91 done (global step: 90)\n",
+      "1592969998.112137: Worker 0: training step 92 done (global step: 91)\n",
+      "1592969998.118540: Worker 0: training step 93 done (global step: 92)\n",
+      "1592969998.125359: Worker 0: training step 94 done (global step: 93)\n",
+      "1592969998.131841: Worker 0: training step 95 done (global step: 94)\n",
+      "1592969998.137258: Worker 0: training step 96 done (global step: 95)\n",
+      "1592969998.143857: Worker 0: training step 97 done (global step: 96)\n",
+      "1592969998.150290: Worker 0: training step 98 done (global step: 97)\n",
+      "1592969998.158311: Worker 0: training step 99 done (global step: 98)\n",
+      "1592969998.164789: Worker 0: training step 100 done (global step: 
99)\n",
+      "1592969998.172325: Worker 0: training step 101 done (global step: 
100)\n",
+      "Training ends @ 1592969998.172426\n",
+      "Training elapsed time: 1.634471 s\n",
+      "After 100 training step(s), validation cross entropy = 1084.43\n"
+     ]
     }
    ],
    "source": [
@@ -362,18 +478,47 @@
    "execution_count": 10,
    "metadata": {
     "pycharm": {
-     "name": "#%%\n",
-     "is_executing": false
+     "is_executing": false,
+     "name": "#%%\n"
     }
    },
    "outputs": [
     {
      "data": {
-      "text/plain": "{'experimentId': 'experiment_1592480334465_0001',\n 
'name': 'mnist',\n 'uid': '47f62be1-9d5a-473b-bac2-a7d40c365b45',\n 'status': 
'Deleted',\n 'acceptedTime': '2020-06-18T19:39:54.000+08:00',\n 'createdTime': 
'2020-06-18T19:39:54.000+08:00',\n 'runningTime': 
'2020-06-18T19:39:55.000+08:00',\n 'finishedTime': 
'2020-06-18T19:43:23.000+08:00',\n 'spec': {'meta': {'name': 'mnist',\n   
'namespace': 'submarine',\n   'framework': 'Tensorflow',\n   'cmd': 'python 
/var/tf_mni [...]
+      "text/plain": [
+       "{'experimentId': 'experiment_1592969710478_0001',\n",
+       " 'name': 'mnist-dist',\n",
+       " 'uid': '360886c6-b5cc-11ea-b5f2-025000000001',\n",
+       " 'status': 'Deleted',\n",
+       " 'acceptedTime': '2020-06-24T11:38:47.000+08:00',\n",
+       " 'createdTime': '2020-06-24T11:38:47.000+08:00',\n",
+       " 'runningTime': '2020-06-24T11:38:49.000+08:00',\n",
+       " 'finishedTime': '2020-06-24T11:40:00.000+08:00',\n",
+       " 'spec': {'meta': {'name': 'mnist-dist',\n",
+       "   'namespace': 'default',\n",
+       "   'framework': 'Tensorflow',\n",
+       "   'cmd': 'python /var/tf_dist_mnist/dist_mnist.py 
--train_steps=100',\n",
+       "   'envVars': {'ENV1': 'ENV1'}},\n",
+       "  'environment': {'image': 
'gcr.io/kubeflow-ci/tf-dist-mnist-test:1.0'},\n",
+       "  'spec': {'Ps': {'replicas': 1,\n",
+       "    'resources': 'cpu=1,memory=1024M',\n",
+       "    'name': None,\n",
+       "    'image': None,\n",
+       "    'cmd': None,\n",
+       "    'envVars': None,\n",
+       "    'resourceMap': {'memory': '1024M', 'cpu': '1'}},\n",
+       "   'Worker': {'replicas': 1,\n",
+       "    'resources': 'cpu=1,memory=1024M',\n",
+       "    'name': None,\n",
+       "    'image': None,\n",
+       "    'cmd': None,\n",
+       "    'envVars': None,\n",
+       "    'resourceMap': {'memory': '1024M', 'cpu': '1'}}}}}"
+      ]
      },
+     "execution_count": 10,
      "metadata": {},
-     "output_type": "execute_result",
-     "execution_count": 10
+     "output_type": "execute_result"
     }
    ],
    "source": [
@@ -404,15 +549,15 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.7.6"
+   "version": "3.7.7"
   },
   "pycharm": {
    "stem_cell": {
     "cell_type": "raw",
-    "source": [],
     "metadata": {
      "collapsed": false
-    }
+    },
+    "source": []
    }
   }
  },


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[submarine] branch master updated: SUBMARINE-528. Polish documents for 0.4.0 release

Reply via email to