yuanzac commented on a change in pull request #143: SUBMARINE-333. Docs of 
submarine server deployment
URL: https://github.com/apache/submarine/pull/143#discussion_r365754483
 
 

 ##########
 File path: docs/design/submarine-server/jobspec.md
 ##########
 @@ -0,0 +1,100 @@
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Generic Job Spec
+
+## Motivation
+As the machine learning platform, the submarine should support multiple 
machine learning framework, such as Tensorflow, Pytorch etc. But different 
framework has different distributed components for the training job. So that we 
designed a generic job spec to abstract the training job across different 
frameworks. In this way, the submarine-server can hide the complexity of 
underlying infrastructure differences and provide a cleaner interface to 
manager jobs
+
+## Proposal
+Considering the Tensorflow and Pytorch framework, we proposal one spec which 
consists of library spec, submitter spec and task specs etc. Such as:
+```yaml
+name: "mnist"
+librarySpec:
+  name: "TensorFlow"
+  version: "2.1.0"
+  image: "gcr.io/kubeflow-ci/tf-mnist-with-summaries:1.0"
+  cmd: "python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log 
--learning_rate=0.01 --batch_size=150"
+  envVars:
+    ENV_1: "ENV1"
+submitterSpec:
+  type: "k8s"
+  configPath:
+  namespace: "submarine"
+  kind: "TFJob"
+  apiVersion: "kubeflow.org/v1"
+taskSpecs:
+  Ps:
+    name: tensorflow
+    replicas: 2
+    resources: "cpu=4,memory=2048M,nvidia.com/gpu=1"
+  Worker:
+    name: tensorflow
+    replicas: 2
+    resources: "cpu=4,memory=2048M,nvidia.com/gpu=1"
+```
+
+### Library Spec
+The library spec describe the info about machine learning framework. All the 
fields as below:
+
+| field | type | optional | description |
+|---|---|---|---|
+| name | string | NO | Machine Learning Framework name. Such as: 
TensorFlow/PyTorch etc. |
+| version | string | NO | The version of ML framework. Such as: 2.1.0 |
+| image | string | NO | The public image used for each task if not specified. 
Such as: apache/submarine |
+| cmd | string | YES | The public entry cmd for the task if not specified. |
+| envVars | key/value | YES | The public env vars for the task if not 
specified. |
+
+### Submitter Spec
+It describe the info of submitter which the user spcified, such as yarn, 
yarnservice or k8s. All the fields as below:
 
 Review comment:
   describe to describes

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to