[submarine] branch master updated: SUBMARINE-1118. Remove relevant yarn pages in the documentation

pingsutw Fri, 10 Dec 2021 02:03:41 -0800

This is an automated email from the ASF dual-hosted git repository.

pingsutw pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/submarine.git



The following commit(s) were added to refs/heads/master by this push:
     new 8eb3e20  SUBMARINE-1118. Remove relevant yarn pages in the 
documentation
8eb3e20 is described below

commit 8eb3e20a034ffb86aeb4328a46b22cb621ea0a90
Author: woodcutter-eric <[email protected]>
AuthorDate: Sun Dec 5 17:39:36 2021 +0800

    SUBMARINE-1118. Remove relevant yarn pages in the documentation
    
    ### What is this PR for?
    <!-- A few sentences describing the overall goals of the pull request's 
commits.
    First time? Check out the contributing guide - 
https://submarine.apache.org/contribution/contributions.html
    -->
    
    Remove outdated yarn documentation.
    
    ### What type of PR is it?
    [ Documentation ]
    
    ### Todos
    * [ ] - Task
    
    ### What is the Jira issue?
    <!-- * Open an issue on Jira 
https://issues.apache.org/jira/browse/SUBMARINE/
    * Put link here, and add [SUBMARINE-*Jira number*] in PR title, eg. 
`SUBMARINE-23. PR title`
    -->
    
    
https://issues.apache.org/jira/projects/SUBMARINE/issues/SUBMARINE-1118?filter=myopenissues
    
    ### How should this be tested?
    <!--
    * First time? Setup Travis CI as described on 
https://submarine.apache.org/contribution/contributions.html#continuous-integration
    * Strongly recommended: add automated unit tests for any new or changed 
behavior
    * Outline any manual steps to test the PR here.
    -->
    ### Screenshots (if appropriate)
    
    ### Questions:
    * Do the license files need updating? No
    * Are there breaking changes for older versions? No
    * Does this need new documentation? No
    
    Author: woodcutter-eric <[email protected]>
    
    Signed-off-by: Kevin <[email protected]>
    
    Closes #818 from woodcutter-eric/SUBMARINE-1118 and squashes the following 
commits:
    
    c2156583 [woodcutter-eric] SUBMARINE-1118. Remove some yarn description in 
the documentation
---
 website/docs/adminDocs/yarn/README.md              |   2 +-
 .../designDocs/architecture-and-requirements.md    | 106 ++++++++++-----------
 .../docs/designDocs/experiment-implementation.md   | 100 +++++++++----------
 website/docs/designDocs/implementation-notes.md    |   6 +-
 .../designDocs/submarine-server/architecture.md    |  56 +++++------
 .../designDocs/submarine-server/experimentSpec.md  |   6 +-
 .../designDocs/wip-designs/submarine-launcher.md   |  39 ++++----
 website/docs/devDocs/README.md                     |   4 +-
 8 files changed, 158 insertions(+), 161 deletions(-)

diff --git a/website/docs/adminDocs/yarn/README.md 
b/website/docs/adminDocs/yarn/README.md
index 50fb134..cb5932c 100644
--- a/website/docs/adminDocs/yarn/README.md
+++ b/website/docs/adminDocs/yarn/README.md
@@ -1,5 +1,5 @@
 ---
-title: Running Submarine on YARN
+title: Running Submarine on YARN (deprecated)
 ---
 
 <!--
diff --git a/website/docs/designDocs/architecture-and-requirements.md 
b/website/docs/designDocs/architecture-and-requirements.md
index 3aac849..7041142 100644
--- a/website/docs/designDocs/architecture-and-requirements.md
+++ b/website/docs/designDocs/architecture-and-requirements.md
@@ -26,49 +26,49 @@ title: Architecture and Requirment
 | Admin | Also called SRE, who manages user's quotas, credentials, team, and 
other components. |
 
 
-## Background 
+## Background
 
-Everybody talks about machine learning today, and lots of companies are trying 
to leverage machine learning to push the business to the next level. Nowadays, 
as more and more developers, infrastructure software companies coming to this 
field, machine learning becomes more and more achievable. 
+Everybody talks about machine learning today, and lots of companies are trying 
to leverage machine learning to push the business to the next level. Nowadays, 
as more and more developers, infrastructure software companies coming to this 
field, machine learning becomes more and more achievable.
 
-In the last decade, the software industry has built many open source tools for 
machine learning to solve the pain points: 
+In the last decade, the software industry has built many open source tools for 
machine learning to solve the pain points:
 
 1. It was not easy to build machine learning algorithms manually, such as 
logistic regression, GBDT, and many other algorithms:
-   **Answer to that:** Industries have open sourced many algorithm libraries, 
tools, and even pre-trained models so that data scientists can directly reuse 
these building blocks to hook up to their data without knowing intricate 
details inside these algorithms and models. 
+   **Answer to that:** Industries have open sourced many algorithm libraries, 
tools, and even pre-trained models so that data scientists can directly reuse 
these building blocks to hook up to their data without knowing intricate 
details inside these algorithms and models.
 
-2. It was not easy to achieve "WYSIWYG, what you see is what you get" from 
IDEs: not easy to get output, visualization, troubleshooting experiences at the 
same place. 
+2. It was not easy to achieve "WYSIWYG, what you see is what you get" from 
IDEs: not easy to get output, visualization, troubleshooting experiences at the 
same place.
    **Answer to that:** Notebooks concept was added to this picture, notebook 
brought the experiences of interactive coding, sharing, visualization, 
debugging under the same user interface. There're popular open-source notebooks 
like Apache Zeppelin/Jupyter.
-   
-3. It was not easy to manage dependencies: ML applications can run on one 
machine is hard to deploy on another machine because it has lots of libraries 
dependencies. 
-   **Answer to that:** Containerization becomes popular and a standard to 
packaging dependencies to make it easier to "build once, run anywhere". 
+
+3. It was not easy to manage dependencies: ML applications can run on one 
machine is hard to deploy on another machine because it has lots of libraries 
dependencies.
+   **Answer to that:** Containerization becomes popular and a standard to 
packaging dependencies to make it easier to "build once, run anywhere".
 
 4. Fragmented tools, libraries were hard for ML engineers to learn. 
Experiences learned in one company are not naturally migratable to another 
company.
    **Answer to that:** A few dominant open-source frameworks reduced the 
overhead of learning too many different frameworks, concepts. Data-scientist 
can learn a few libraries such as Tensorflow/PyTorch, and a few high-level 
wrappers like Keras will be able to create your machine learning application 
from other open-source building blocks.
 
 5. Similarly, models built by one library (such as libsvm) were hard to be 
integrated into machine learning pipeline since there's no standard format.
    **Answer to that:** Industry has built successful open-source standard 
machine learning frameworks such as Tensorflow/PyTorch/Keras so their format 
can be easily shared across. And efforts to build an even more general model 
format such as ONNX.
-   
-6. It was hard to build a data pipeline that flows/transform data from a raw 
data source to whatever required by ML applications. 
+
+6. It was hard to build a data pipeline that flows/transform data from a raw 
data source to whatever required by ML applications.
    **Answer to that:** Open source big data industry plays an important role 
in providing, simplify, unify processes and building blocks for data flows, 
transformations, etc.
 
-The machine learning industry is moving on the right track to solve major 
roadblocks. So what are the pain points now for companies which have machine 
learning needs? What can we help here? To answer this question, let's look at 
machine learning workflow first. 
+The machine learning industry is moving on the right track to solve major 
roadblocks. So what are the pain points now for companies which have machine 
learning needs? What can we help here? To answer this question, let's look at 
machine learning workflow first.
 
 ## Machine Learning Workflows & Pain points
 
 ```
 1) From different data sources such as edge, clickstream, logs, etc.
-   => Land to data lakes  
-   
-2) From data lake, data transformation: 
-   => Data transformations: Cleanup, remove invalid rows/columns, 
+   => Land to data lakes
+
+2) From data lake, data transformation:
+   => Data transformations: Cleanup, remove invalid rows/columns,
                             select columns, sampling, split train/test
                             data-set, join table, etc.
    => Data prepared for training.
-                            
-3) From prepared data: 
-   => Training, model hyper-parameter tuning, cross-validation, etc. 
-   => Models saved to storage. 
-   
-4) From saved models: 
+
+3) From prepared data:
+   => Training, model hyper-parameter tuning, cross-validation, etc.
+   => Models saved to storage.
+
+4) From saved models:
    => Model assurance, deployment, A/B testing, etc.
    => Model deployed for online serving or offline scoring.
 ```
@@ -77,15 +77,15 @@ Typically data scientists responsible for item 2)-4), 1) 
typically handled by a
 
 ### Pain \#1 Complex workflow/steps from raw data to model, different tools 
needed by different steps, hard to make changes to workflow, and not error-proof
 
-It is a complex workflow from raw data to usable models, after talking to many 
different data scientists, we have learned that a typical procedure to train a 
new model and push to production can take months to 1-2 years. 
+It is a complex workflow from raw data to usable models, after talking to many 
different data scientists, we have learned that a typical procedure to train a 
new model and push to production can take months to 1-2 years.
 
-It is also a wide skill set required by this workflow. For example, data 
transformation needs tools like Spark/Hive for large scale and tools like 
Pandas for a small scale. And model training needs to be switched between 
XGBoost, Tensorflow, Keras, PyTorch. Building a data pipeline requires Apache 
Airflow or Oozie. 
+It is also a wide skill set required by this workflow. For example, data 
transformation needs tools like Spark/Hive for large scale and tools like 
Pandas for a small scale. And model training needs to be switched between 
XGBoost, Tensorflow, Keras, PyTorch. Building a data pipeline requires Apache 
Airflow or Oozie.
 
 Yes, there are great, standardized open-source tools built for many of such 
purposes. But how about changes need to be made for a particular part of the 
data pipeline? How about adding a few columns to the training data for 
experiments? How about training models, and push models to validation, A/B 
testing before rolling to production? All these steps need jumping between 
different tools, UIs, and very hard to make changes, and it is not error-proof 
during these procedures.
 
 ### Pain \#2 Dependencies of underlying resource management platform
 
-To make jobs/services required by a machine learning platform to be able to 
run, we need an underlying resource management platform. There're some choices 
of resource management platform, and they have distinct advantages and 
disadvantages. 
+To make jobs/services required by a machine learning platform to be able to 
run, we need an underlying resource management platform. There're some choices 
of resource management platform, and they have distinct advantages and 
disadvantages.
 
 For example, there're many machine learning platform built on top of K8s. It 
is relatively easy to get a K8s from a cloud vendor, easy to orchestrate 
machine learning required services/daemons run on K8s. However, K8s doesn't 
offer good support jobs like Spark/Flink/Hive. So if your company has 
Spark/Flink/Hive running on YARN, there're gaps and a significant amount of 
work to move required jobs from YARN to K8s. Maintaining a separate K8s cluster 
is also overhead to Hadoop-based data in [...]
 
@@ -95,7 +95,7 @@ Similarly, if your company's data pipelines are mostly built 
on top of cloud res
 
 In addition to the above pain, we do see Data Scientists are forced to learn 
underlying platform knowledge to be able to build a real-world machine learning 
workflow.
 
-For most of the data scientists we talked with, they're experts of ML 
algorithms/libraries, feature engineering, etc. They're also most familiar with 
Python, R, and some of them understand Spark, Hive, etc. 
+For most of the data scientists we talked with, they're experts of ML 
algorithms/libraries, feature engineering, etc. They're also most familiar with 
Python, R, and some of them understand Spark, Hive, etc.
 
 If they're asked to do interactions with lower-level components like 
fine-tuning a Spark job's performance; or troubleshooting job failed to launch 
because of resource constraints; or write a K8s/YARN job spec and mount 
volumes, set networks properly. They will scratch their heads and typically 
cannot perform these operations efficiently.
 
@@ -115,11 +115,11 @@ An abstraction layer/framework to help the developer to 
boost ML pipeline develo
 
 ### A little bit history
 
-Initially, Submarine is built to solve problems of running deep learning jobs 
like Tensorflow/PyTorch on Apache Hadoop YARN, allows admin to monitor launched 
deep learning jobs, and manage generated models. 
+Initially, Submarine is built to solve problems of running deep learning jobs 
like Tensorflow/PyTorch on Apache Hadoop YARN, allows admin to monitor launched 
deep learning jobs, and manage generated models.
 
 It was part of YARN initially, and code resides under 
`hadoop-yarn-applications`. Later, the community decided to convert it to be a 
subproject within Hadoop (Sibling project of YARN, HDFS, etc.) because we want 
to support other resource management platforms like K8s. And finally, we're 
reconsidering Submarine's charter, and the Hadoop community voted that it is 
the time to moved Submarine to a separate Apache TLP.
 
-### Why Submarine? 
+### Why Submarine?
 
 `ONE PLATFORM`
 
@@ -145,22 +145,22 @@ A running notebook instance is called notebook session 
(or session for short).
 
 ### Experiment
 
-Experiments of Submarine is an offline task. It could be a shell command, a 
Python command, a Spark job, a SQL query, or even a workflow. 
+Experiments of Submarine is an offline task. It could be a shell command, a 
Python command, a Spark job, a SQL query, or even a workflow.
 
 The primary purposes of experiments under Submarine's context is to do 
training tasks, offline scoring, etc. However, experiment can be generalized to 
do other tasks as well.
 
-Major requirement of experiment: 
+Major requirement of experiment:
 
 1) Experiments can be submitted from UI/CLI/SDK.
 2) Experiments can be monitored/managed from UI/CLI/SDK.
-3) Experiments should not bind to one resource management platform (K8s/YARN).
+3) Experiments should not bind to one resource management platform (K8s).
 
 #### Type of experiments
 
 ![](/img/design/experiments.png)
 
-There're two types of experiments: 
-`Adhoc experiments`: which includes a Python/R/notebook, or even an adhoc 
Tensorflow/PyTorch task, etc. 
+There're two types of experiments:
+`Adhoc experiments`: which includes a Python/R/notebook, or even an adhoc 
Tensorflow/PyTorch task, etc.
 
 `Predefined experiment library`: This is specialized experiments, which 
including developed libraries such as CTR, BERT, etc. Users are only required 
to specify a few parameters such as input, output, hyper parameters, etc. 
Instead of worrying about where's training script/dependencies located.
 
@@ -169,15 +169,15 @@ There're two types of experiments:
 Requirements:
 
 - Allow run adhoc scripts.
-- Allow model engineer, data scientist to run Tensorflow/Pytorch programs on 
YARN/K8s/Container-cloud. 
-- Allow jobs easy access data/models in HDFS/s3, etc. 
+- Allow model engineer, data scientist to run Tensorflow/Pytorch programs on 
K8s/Container-cloud.
+- Allow jobs easy access data/models in HDFS/s3, etc.
 - Support run distributed Tensorflow/Pytorch jobs with simple configs.
 - Support run user-specified Docker images.
 - Support specify GPU and other resources.
 
 #### Predefined experiment library
 
-Here's an example of predefined experiment library to train deepfm model: 
+Here's an example of predefined experiment library to train deepfm model:
 
 ```
 {
@@ -205,20 +205,20 @@ Predefined experiment libraries can be shared across 
users on the same platform,
 
 We will also model AutoML, auto hyper-parameter tuning to predefined 
experiment library.
 
-#### Pipeline 
+#### Pipeline
 
 Pipeline is a special kind of experiment:
 
-- A pipeline is a DAG of experiments. 
+- A pipeline is a DAG of experiments.
 - Can be also treated as a special kind of experiment.
 - Users can submit/terminate a pipeline.
 - Pipeline can be created/submitted via UI/API.
 
 ### Environment Profiles
 
-Environment profiles (or environment for short) defines a set of libraries and 
when Docker is being used, a Docker image in order to run an experiment or a 
notebook. 
+Environment profiles (or environment for short) defines a set of libraries and 
when Docker is being used, a Docker image in order to run an experiment or a 
notebook.
 
-Docker or VM image (such as AMI: Amazon Machine Images) defines the base layer 
of the environment. 
+Docker or VM image (such as AMI: Amazon Machine Images) defines the base layer 
of the environment.
 
 On top of that, users can define a set of libraries (such as Python/R) to 
install.
 
@@ -228,16 +228,16 @@ Environments can be added/listed/deleted/selected through 
CLI/SDK.
 
 ### Model
 
-#### Model management 
+#### Model management
 
 - Model artifacts are generated by experiments or notebook.
-- A model consists of artifacts from one or multiple files. 
+- A model consists of artifacts from one or multiple files.
 - Users can choose to save, tag, version a produced model.
 - Once The Model is saved, Users can do the online model serving or offline 
scoring of the model.
 
 #### Model serving
 
-After model saved, users can specify a serving script, a model and create a 
web service to serve the model. 
+After model saved, users can specify a serving script, a model and create a 
web service to serve the model.
 
 We call the web service to "endpoint". Users can manage (add/stop) model 
serving endpoints via CLI/API/UI.
 
@@ -247,36 +247,36 @@ Submarine-SDK provides tracking/metrics APIs, which 
allows developers to add tra
 
 ### Deployment
 
-Submarine Services (See architecture overview below) should be deployed easily 
on-prem / on-cloud. Since there're more and more public cloud offering for 
compute/storage management on cloud, we need to support deploy Submarine 
compute-related workloads (such as notebook session, experiments, etc.) to 
cloud-managed clusters. 
+Submarine Services (See architecture overview below) should be deployed easily 
on-prem / on-cloud. Since there're more and more public cloud offering for 
compute/storage management on cloud, we need to support deploy Submarine 
compute-related workloads (such as notebook session, experiments, etc.) to 
cloud-managed clusters.
 
 This also include Submarine may need to take input parameters from customers 
and create/manage clusters if needed. It is also a common requirement to use 
hybrid of on-prem/on-cloud clusters.
 
 ### Security / Access Control / User Management / Quota Management
 
-There're 4 kinds of objects need access-control: 
+There're 4 kinds of objects need access-control:
 
 - Assets belong to Submarine system, which includes notebook, experiments and 
results, models, predefined experiment libraries, environment profiles.
-- Data security. (Who owns what data, and what data can be accessed by each 
users). 
+- Data security. (Who owns what data, and what data can be accessed by each 
users).
 - User credentials. (Such as LDAP).
 - Other security, such as Git repo access, etc.
 
-For the data security / user credentials / other security, it will be 
delegated to 3rd libraries such as Apache Ranger, IAM roles, etc. 
+For the data security / user credentials / other security, it will be 
delegated to 3rd libraries such as Apache Ranger, IAM roles, etc.
 
 Assets belong to Submarine system will be handled by Submarine itself.
 
-Here're operations which Submarine admin can do for users / teams which can be 
used to access Submarine's assets. 
+Here're operations which Submarine admin can do for users / teams which can be 
used to access Submarine's assets.
 
-**Operations for admins** 
+**Operations for admins**
 
-- Admin uses "User Management System" to onboard new users, upload user 
credentials, assign resource quotas, etc. 
-- Admins can create new users, new teams, update user/team mappings. Or remove 
users/teams. 
+- Admin uses "User Management System" to onboard new users, upload user 
credentials, assign resource quotas, etc.
+- Admins can create new users, new teams, update user/team mappings. Or remove 
users/teams.
 - Admin can set resource quotas (if different from system default), 
permissions, upload/update necessary credentials (like Kerberos keytab) of a 
user.
 - A DE/DS can also be an admin if the DE/DS has admin access. (Like a 
privileged user). This will be useful when a cluster is exclusively shared by a 
user or only shared by a small team.
 - `Resource Quota Management System` helps admin to manage resources quotas of 
teams, organizations. Resources can be machine resources like CPU/Memory/Disk, 
etc. It can also include non-machine resources like $$-based budgets.
 
-### Dataset 
+### Dataset
 
-There's also need to tag dataset which will be used for training and shared 
across the platform by different users. 
+There's also need to tag dataset which will be used for training and shared 
across the platform by different users.
 
 Like mentioned above, access to the actual data will be handled by 3rd party 
system like Apache Ranger / Hive Metastore which is out of the Submarine's 
scope.
 
@@ -300,7 +300,7 @@ Like mentioned above, access to the actual data will be 
handled by 3rd party sys
      | |Experiment       | |Compute Resource | |Other Management     | |
      | |Manager          | |   Manager       | |Services             | |
      | +-----------------+ +-----------------+ +---------------------+ |
-     |   Spark, template      YARN/K8s/Docker                          |
+     |   Spark, template      K8s/Docker                          |
      |   TF, PyTorch, pipeline                                         |
      |                                                                 |
      + +-----------------+                                             +
diff --git a/website/docs/designDocs/experiment-implementation.md 
b/website/docs/designDocs/experiment-implementation.md
index a87bb89..ea110da 100644
--- a/website/docs/designDocs/experiment-implementation.md
+++ b/website/docs/designDocs/experiment-implementation.md
@@ -20,7 +20,7 @@ title: Experiment Implementation
 
 This document talks about implementation of experiment, flows and design 
considerations.
 
-Experiment consists of following components, also interact with other 
Submarine or 3rd-party components, showing below: 
+Experiment consists of following components, also interact with other 
Submarine or 3rd-party components, showing below:
 
 ```
 
@@ -44,18 +44,18 @@ Experiment consists of following components, also interact 
with other Submarine
                                  | (Launch Task with resources)
                                  +
                  +---------------------------------+
-                 |Resource Manager (K8s/YARN/Cloud)|
+                 |Resource Manager (K8s/Cloud)|
                  +---------------------------------+
 ```
 
-As showing in the above diagram, Submarine experiment consists of the 
following items: 
+As showing in the above diagram, Submarine experiment consists of the 
following items:
 
-- On the left side, there're input data and run configs. 
-- In the middle box, they're experiment tasks, it could be multiple tasks when 
we run distributed training, pipeline, etc. 
-  - There're main runnable code, such as `train.py` for the training main 
entry point. 
-  - The two boxes below: experiment dependencies and OS/Base libraries we 
called `Submarine Environment Profile` or  `Environment` for short. Which 
defined what is the basic libraries to run the main experiment code. 
-  - Experiment tasks are launched by Resource Manager, such as K8s/YARN/Cloud 
or just launched locally. There're resources constraints for each experiment 
tasks. (e.g. how much memory, cores, GPU, disk etc. can be used by tasks). 
-- On the right side, they're artifacts generated by experiments: 
+- On the left side, there're input data and run configs.
+- In the middle box, they're experiment tasks, it could be multiple tasks when 
we run distributed training, pipeline, etc.
+  - There're main runnable code, such as `train.py` for the training main 
entry point.
+  - The two boxes below: experiment dependencies and OS/Base libraries we 
called `Submarine Environment Profile` or  `Environment` for short. Which 
defined what is the basic libraries to run the main experiment code.
+  - Experiment tasks are launched by Resource Manager, such as K8s/Cloud or 
just launched locally. There're resources constraints for each experiment 
tasks. (e.g. how much memory, cores, GPU, disk etc. can be used by tasks).
+- On the right side, they're artifacts generated by experiments:
   - Output artifacts: Which are main output of the experiment, it could be 
model(s), or output data when we do batch prediction.
   - Logs/Metrics for further troubleshooting or understanding of experiment's 
quality.
 
@@ -63,7 +63,7 @@ For the rest of the design doc, we will talk about how we 
handle environment, co
 
 ## API of Experiment
 
-This is not a full definition of experiment, for more details, please 
reference to experiment API. 
+This is not a full definition of experiment, for more details, please 
reference to experiment API.
 
 Here's just an example of experiment object which help developer to understand 
what included in an experiment.
 
@@ -74,17 +74,17 @@ experiment:
        environment: "team-default-ml-env"
        code:
               sync_mode: s3
-           url: "s3://bucket/training-job.tar.gz" 
-       parameter: > python training.py --iteration 10 
+           url: "s3://bucket/training-job.tar.gz"
+       parameter: > python training.py --iteration 10
                     --input=s3://bucket/input output=s3://bucket/output
-       resource_constraint: 
+       resource_constraint:
                   res="mem=20gb, vcore=3, gpu=2"
        timeout: "30 mins"
 ```
 
-This defined a "script" experiment, which has a name "abc", the name can be 
used to track the experiment. There's environment "team-default-ml-env" defined 
to make sure dependencies of the job can be downloaded properly before 
executing the job. 
+This defined a "script" experiment, which has a name "abc", the name can be 
used to track the experiment. There's environment "team-default-ml-env" defined 
to make sure dependencies of the job can be downloaded properly before 
executing the job.
 
-`code` defined where the experiment code will be downloaded, we will support a 
couple of sync_mode like s3 (or abfs/hdfs), git, etc. 
+`code` defined where the experiment code will be downloaded, we will support a 
couple of sync_mode like s3 (or abfs/hdfs), git, etc.
 
 Different types of experiments will have different specs, for example 
distributed Tensorflow spec may look like:
 
@@ -92,18 +92,18 @@ Different types of experiments will have different specs, 
for example distribute
 experiment:
        name: "abc-distributed-tf",
        type: "distributed-tf",
-       ps: 
+       ps:
             environment: "team-default-ml-cpu"
-            resource_constraint: 
+            resource_constraint:
                  res="mem=20gb, vcore=3, gpu=0"
-       worker: 
+       worker:
             environment: "team-default-ml-gpu"
-            resource_constraint: 
+            resource_constraint:
                  res="mem=20gb, vcore=3, gpu=2"
        code:
               sync_mode: git
-           url: "https://foo.com/training-job.git"; 
-       parameter: > python /code/training-job/training.py --iteration 10 
+           url: "https://foo.com/training-job.git";
+       parameter: > python /code/training-job/training.py --iteration 10
                     --input=s3://bucket/input output=s3://bucket/output
        tensorboard: enabled
        timeout: "30 mins"
@@ -134,7 +134,7 @@ To better understand experiment implementation, It will be 
good to understand wh
 Before submit the environment, you have to choose what environment to choose. 
Environment defines dependencies, etc. of an experiment or a notebook. might 
looks like below:
 
 ```
-conda_environment = 
+conda_environment =
 """
   name: conda-env
   channels:
@@ -156,7 +156,7 @@ environment = create_environment {
 }
 ```
 
-To better understand how environment works, please refer to 
[environment-implementation](./environments-implementation.md). 
+To better understand how environment works, please refer to 
[environment-implementation](./environments-implementation.md).
 
 ### Create experiment, specify where's training code located, and parameters.
 
@@ -164,7 +164,7 @@ For  ad-hoc experiment (code located at S3), assume 
training code is part of the
 
 ```
 experiment = create_experiment {
-    Environment = environment, 
+    Environment = environment,
     ExperimentConfig = {
        type = "adhoc",
        localize_artifacts = [
@@ -184,7 +184,7 @@ It is possible we want to run a notebook file in offline 
mode, to do that, here'
 
 ```
 experiment = create_experiment {
-    Environment = environment, 
+    Environment = environment,
     ExperimentConfig = {
        type = "adhoc",
        localize_artifacts = [
@@ -203,12 +203,12 @@ experiment.wait_for_finish(print_output=True)
 ```
 experiment = create_experiment {
     # Here you can use default environment of library
-    Environment = environment, 
+    Environment = environment,
     ExperimentConfig = {
        type = "template",
        name = "abc",
-       # A unique name of template 
-       template = "deepfm_ctr", 
+       # A unique name of template
+       template = "deepfm_ctr",
        # yaml file defined what is the parameters need to be specified.
        parameter = {
            Input: "S3://.../input",
@@ -238,7 +238,7 @@ There's a common misunderstanding about what is the 
differences between running
 | Run history (meta, logs, metrics) | Meta/logs/metrics can be traced from 
experiment UI (or corresponding API) | No run history can be traced from 
Submarine UI/API. Can view the current running paragraph's log/metrics, etc. |
 | What to run?                      | Code from Docker image or shared storage 
(like Tarball on S3, Github, etc.) | Local in the notebook's paragraph          
                  |
 
-**Commonalities** 
+**Commonalities**
 
 |             | Experiment & Notebook Session                     |
 | ----------- | ------------------------------------------------- |
@@ -254,21 +254,21 @@ The experiment manager receives the experiment requests, 
persisting the experime
 
 ### Compute Cluster Manager
 
-After experiment accepted by experiment manager, based on which cluster the 
experiment intended to run (like mentioned in the previous sections, Submarine 
supports to manage multiple compute clusters), compute cluster manager will 
returns credentials to access the compute cluster. It will also be responsible 
to create a new compute cluster if needed. 
+After experiment accepted by experiment manager, based on which cluster the 
experiment intended to run (like mentioned in the previous sections, Submarine 
supports to manage multiple compute clusters), compute cluster manager will 
returns credentials to access the compute cluster. It will also be responsible 
to create a new compute cluster if needed.
 
-For most of the on-prem use cases, there's only one cluster involved, for such 
cases, ComputeClusterManager returns credentials to access local cluster if 
needed. 
+For most of the on-prem use cases, there's only one cluster involved, for such 
cases, ComputeClusterManager returns credentials to access local cluster if 
needed.
 
 ### Experiment Submitter
 
-Experiment Submitter handles different kinds of experiments to run (e.g. 
ad-hoc script, distributed TF, MPI, pre-defined templates, Pipeline, AutoML, 
etc.). And such experiments can be managed by different resource management 
systems (e.g. K8s, YARN, container cloud, etc.)
+Experiment Submitter handles different kinds of experiments to run (e.g. 
ad-hoc script, distributed TF, MPI, pre-defined templates, Pipeline, AutoML, 
etc.). And such experiments can be managed by different resource management 
systems (e.g. K8s, container cloud, etc.)
 
-To meet the requirements to support variant kinds of experiments and resource 
managers, we choose to use plug-in modules to support different submitters 
(which requires jars to submarine-server’s classpath). 
+To meet the requirements to support variant kinds of experiments and resource 
managers, we choose to use plug-in modules to support different submitters 
(which requires jars to submarine-server’s classpath).
 
 To avoid jars and dependencies of plugins break the submarine-server, the 
plug-ins manager, or both. To solve this issue, we can instantiate submitter 
plug-ins using a classloader that is different from the system classloader.
 
 #### Submitter Plug-ins
 
-Each plug-in uses a separate module under the server-submitter module. As the 
default implements, we provide for YARN and K8s. For YARN cluster, we provide 
the submitter-yarn and submitter-yarnservice plug-ins. The submitter-yarn 
plug-in used the [TonY](https://github.com/linkedin/TonY) as the runtime to run 
the training job, and the submitter-yarnservice plug-in direct use the [YARN 
Service](https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html)
 w [...]
+Each plug-in uses a separate module under the server-submitter module. As the 
default implements, we provide for K8s.
 
 The submitter-k8s plug-in is used to submit the job to Kubernetes cluster and 
use the 
[operator](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/) as 
the runtime. The submitter-k8s plug-in implements the operation of CRD object 
and provides the java interface. In the beginning, we use the 
[tf-operator](https://github.com/kubeflow/tf-operator) for the TensorFlow.
 
@@ -305,7 +305,7 @@ The monitor tracks the experiment life cycle and records 
the main events and key
           |   create a new one.|  to submit       |+---------------> |
           |                    |  Different kinds |  Once job is     |
           |                    |  of experiments  |  submitted, use  |+----+
-          |                    |  to k8s/yarn, etc|  monitor to get  |     |
+          |                    |  to k8s, etc|  monitor to get  |     |
           |                    |                  |  status updates  |     |
           |                    |                  |                  |     | 
Monitor
           |                    |                  |                  |     | 
Xperiment
@@ -325,11 +325,11 @@ TODO: add more details about template, environment, etc.
 
 ## Common modules of experiment/notebook-session/model-serving
 
-Experiment/notebook-session/model-serving share a lot of commonalities, all of 
them are: 
+Experiment/notebook-session/model-serving share a lot of commonalities, all of 
them are:
 
-- Some workloads running on YARN/K8s.
-- Need persist meta data to DB. 
-- Need monitor task/service running status from resource management system. 
+- Some workloads running on K8s.
+- Need persist meta data to DB.
+- Need monitor task/service running status from resource management system.
 
 We need to make their implementation are loose-coupled, but at the same time, 
share some building blocks as much as possible (e.g. submit PodSpecs to K8s, 
monitor status, get logs, etc.) to reduce duplications.
 
@@ -374,29 +374,29 @@ The template will be (in yaml format):
 ```yaml
 # deepfm.ctr template
 name: deepfm.ctr
-author: 
+author:
 description: >
   This is a template to run CTR training using deepfm algorithm, by default it 
runs
   single node TF job, you can also overwrite training parameters to use 
distributed
-  training. 
-  
-parameters: 
+  training.
+
+parameters:
   - name: input.train_data
-    required: true 
+    required: true
     description: >
-      train data is expected in SVM format, and can be stored in HDFS/S3 
+      train data is expected in SVM format, and can be stored in HDFS/S3
     ...
   - name: training.batch_size
     required: false
-    default: 32 
+    default: 32
     description: This is batch size of training
 ```
 
-The batch format can be used in UI/API. 
+The batch format can be used in UI/API.
 
 ### Handle Predefined-experiment-template from server side
 
-Please note that, the conversion of predefined-experiment-template will be 
always handled by server. The invoke flow looks like: 
+Please note that, the conversion of predefined-experiment-template will be 
always handled by server. The invoke flow looks like:
 
 ```
 
@@ -431,9 +431,9 @@ Please note that, the conversion of 
predefined-experiment-template will be alway
                          +----------------------------------------------------+
 ```
 
-Basically, from Client, it submitted template parameters to Submarine Server, 
inside submarine server, it finds the corresponding template handler based on 
the name. And the template handler converts input parameters to an actual 
experiment, such as a distributed TF experiment. After that, it goes the 
similar route to validate experiment spec, compute cluster manager, etc. to get 
the experiment submitted and monitored. 
+Basically, from Client, it submitted template parameters to Submarine Server, 
inside submarine server, it finds the corresponding template handler based on 
the name. And the template handler converts input parameters to an actual 
experiment, such as a distributed TF experiment. After that, it goes the 
similar route to validate experiment spec, compute cluster manager, etc. to get 
the experiment submitted and monitored.
 
-Predefined-experiment-template is able to create any kind of experiment, it 
could be a pipeline: 
+Predefined-experiment-template is able to create any kind of experiment, it 
could be a pipeline:
 
 ```
 
diff --git a/website/docs/designDocs/implementation-notes.md 
b/website/docs/designDocs/implementation-notes.md
index 7ebb996..ca226c4 100644
--- a/website/docs/designDocs/implementation-notes.md
+++ b/website/docs/designDocs/implementation-notes.md
@@ -22,12 +22,12 @@ Before digging into details of implementations, you should 
read [architecture-an
 Here're sub topics of Submarine implementations:
 
 - [Submarine Storage](./storage-implementation.md): How to store metadata, 
logs, metrics, etc. of Submarine.
-- [Submarine Environment](./environments-implementation.md): How environments 
created, managed, stored in Submarine. 
+- [Submarine Environment](./environments-implementation.md): How environments 
created, managed, stored in Submarine.
 - [Submarine Experiment](./experiment-implementation.md): How experiments 
managed, stored, and how the predefined experiment template works.
 - [Submarine Notebook](./notebook-implementation.md): How experiments managed, 
stored, and how the predefined experiment template works.
 - [Submarine Server](./submarine-server/architecture.md): How Submarine server 
is designed, architecture, implementation notes, etc.
 
-Working-in-progress designs, Below are designs which are working-in-progress, 
we will move them to the upper section once design & review is finished: 
+Working-in-progress designs, Below are designs which are working-in-progress, 
we will move them to the upper section once design & review is finished:
 
 - [Submarine HA Design](./wip-designs/submarine-clusterServer.md): How 
Submarine HA can be achieved, using RAFT, etc.
-- [Submarine services deployment module:](./wip-designs/submarine-launcher.md) 
How to deploy submarine services to k8s, YARN or cloud. 
+- [Submarine services deployment module:](./wip-designs/submarine-launcher.md) 
How to deploy submarine services to k8s or cloud.
diff --git a/website/docs/designDocs/submarine-server/architecture.md 
b/website/docs/designDocs/submarine-server/architecture.md
index 5b572c1..4f1750f 100644
--- a/website/docs/designDocs/submarine-server/architecture.md
+++ b/website/docs/designDocs/submarine-server/architecture.md
@@ -47,10 +47,10 @@ title: Submarine Server Implementation
 Here's a diagram to illustrate the Submarine's deployment.
 
 - Submarine Server consists of web service/proxy, and backend services. 
They're like "control planes" of Submarine, and users will interact with these 
services.
-- Submarine server could be a microservice architecture and can be deployed to 
one of the compute clusters. (see below, this will be useful when we only have 
one cluster). 
+- Submarine server could be a microservice architecture and can be deployed to 
one of the compute clusters. (see below, this will be useful when we only have 
one cluster).
 - There're multiple compute clusters that could be used by Submarine service. 
For user's running notebook instance, jobs, etc. they will be placed to one of 
the compute clusters by user's preference or defined policies.
 - Submarine's asset includes 
project/notebook(content)/models/metrics/dataset-meta, etc. can be stored 
inside Submarine's own database.
-- Datasets can be stored in various locations such as S3/HDFS. 
+- Datasets can be stored in various locations such as S3/HDFS.
 - Users can push container (such as Docker) images to a preconfigured registry 
in Submarine, so Submarine service can know how to pull required container 
images.
 - Image Registry/Data-Storage, etc. are outside of Submarine server's scope 
and should be managed by 3rd party applications.
 
@@ -74,7 +74,7 @@ Submarine Server exposed UI and REST API. Users can also use 
CLI / SDK to manage
            +----------+
 ```
 
-REST API will be used by the other 3 approaches. (CLI/SDK/UI) 
+REST API will be used by the other 3 approaches. (CLI/SDK/UI)
 
 The REST API Service handles HTTP requests and is responsible for 
authentication. It acts as the caller for the JobManager component.
 
@@ -82,25 +82,25 @@ The REST component defines the generic job spec which 
describes the detailed inf
 
 ## Proposal
  ```
-                                                                
+---------------------+
- +-----------+                                                  | +--------+   
+----+ |
- |           |                                                  | 
|runtime1+-->+job1| |
- | workbench +---+   +----------------------------------+       | +--------+   
+----+ |
- |           |   |   | +------+ +---------------------+ |   +-->+ +--------+   
+----+ |
- +-----------+   |   | |      | | +------+  +-------+ | |   |   | 
|runtime2+-->+job2| |
-                 |   | |      | | | YARN |  |  K8s  | | |   |   | +--------+   
+----+ |
- +-----------+   |   | |      | | +------+  +-------+ | |   |   |     YARN 
Cluster    |
- |           |   |   | |      | |      submitter      | |   |   
+---------------------+
- |    CLI    +------>+ | REST | +---------------------+ +---+
- |           |   |   | |      | +---------------------+ |   |   
+---------------------+
- +-----------+   |   | |      | | +-------+ +-------+ | |   |   | +--------+   
+----+ |
-                 |   | |      | | |PlugMgr| |monitor| | |   |   | |        
+-->+job1| |
- +-----------+   |   | |      | | +-------+ +-------+ | |   |   | |        |   
+----+ |
- |           |   |   | |      | |      JobManager     | |   +-->+ |operator|   
+----+ |
- |    SDK    +---+   | +------+ +---------------------+ |       | |        
+-->+job2| |
- |           |       +----------------------------------+       | +--------+   
+----+ |
- +-----------+                                                  |     K8s 
Cluster     |
-    client                          server                      
+---------------------+
+
+ +-----------+
+ |           |
+ | workbench +---+   +----------------------------------+
+ |           |   |   | +------+ +---------------------+ |
+ +-----------+   |   | |      | |      +-------+      | |     
+---------------------+
+                 |   | |      | |      |  K8s  |      | |     | +--------+   
+----+ |
+ +-----------+   |   | |      | |      +-------+      | |     | |        
+-->+job1| |
+ |           |   |   | |      | |      submitter      | |     | |        |   
+----+ |
+ |    CLI    +------>+ | REST | +---------------------+ +---->+ |operator|   
+----+ |
+ |           |   |   | |      | +---------------------+ |     | |        
+-->+job2| |
+ +-----------+   |   | |      | | +-------+ +-------+ | |     | +--------+   
+----+ |
+                 |   | |      | | |PlugMgr| |monitor| | |     |     K8s 
Cluster     |
+ +-----------+   |   | |      | | +-------+ +-------+ | |     
+---------------------+
+ |           |   |   | |      | |      JobManager     | |
+ |    SDK    +---+   | +------+ +---------------------+ |
+ |           |       +----------------------------------+
+ +-----------+
+    client                          server
  ```
 We propose to split the original core module in the old layout into two 
modules, CLI and server as shown in FIG. The submarine-client calls the REST 
APIs to submit and retrieve the job info. The submarine-server provides the 
REST service, job management, submitting the job to cluster, and running job in 
different clusters through the corresponding runtime.
 
@@ -126,11 +126,11 @@ We propose to split the original core module in the old 
layout into two modules,
    +----------------------------------------------------------------------+
 ```
 
-### Experiment Manager 
+### Experiment Manager
 
 TODO
 
-### Notebook Sessions Manager 
+### Notebook Sessions Manager
 
 TODO
 
@@ -142,7 +142,7 @@ TODO
 
 TODO
 
-### Model Serving Manager 
+### Model Serving Manager
 
 TODO
 
@@ -150,11 +150,11 @@ TODO
 
 TODO
 
-### Dataset Manager 
+### Dataset Manager
 
 TODO
 
-### User/team permissions manager 
+### User/team permissions manager
 
 TODO
 
@@ -164,4 +164,4 @@ TODO
 
 ## Components/services outside of Submarine Server's scope
 
-TODO: Describe what are the out-of-scope components, which should be handled 
and managed outside of Submarine server. Candidates are: Identity management, 
data storage, metastore storage, etc.
\ No newline at end of file
+TODO: Describe what are the out-of-scope components, which should be handled 
and managed outside of Submarine server. Candidates are: Identity management, 
data storage, metastore storage, etc.
diff --git a/website/docs/designDocs/submarine-server/experimentSpec.md 
b/website/docs/designDocs/submarine-server/experimentSpec.md
index fc2abfb..f705a87 100644
--- a/website/docs/designDocs/submarine-server/experimentSpec.md
+++ b/website/docs/designDocs/submarine-server/experimentSpec.md
@@ -1,5 +1,5 @@
 ---
-title: Generic Expeiment Spec
+title: Generic Experiment Spec
 ---
 
 <!--
@@ -61,13 +61,13 @@ The library spec describes the info about machine learning 
framework. All the fi
 | envVars | key/value | YES | The public env vars for the task if not 
specified. |
 
 ### Submitter Spec
-It describes the info of submitter which the user specified, such as yarn, 
yarnservice or k8s. All the fields as below:
+It describes the info of submitter which the user specified, such as k8s. All 
the fields as below:
 
 | field | type | optional | description |
 |---|---|---|---|
 | type | string | NO | The submitter type, supports `k8s` now |
 | configPath | string | YES | The config path of the specified resource 
manager. You can set it in submarine-site.xml if run submarine-server locally |
-| namespace | string | NO | It's known as queue in Apache Hadoop YARN and 
namespace in Kubernetes. |
+| namespace | string | NO | It's known as namespace in Kubernetes. |
 | kind | string | YES | It's used for k8s submitter, supports TFJob and 
PyTorchJob |
 | apiVersion | string | YES | It should pair with the kind, such as the 
TFJob's api version is `kubeflow.org/v1` |
 
diff --git a/website/docs/designDocs/wip-designs/submarine-launcher.md 
b/website/docs/designDocs/wip-designs/submarine-launcher.md
index 2cc0ee9..6f05d33 100644
--- a/website/docs/designDocs/wip-designs/submarine-launcher.md
+++ b/website/docs/designDocs/wip-designs/submarine-launcher.md
@@ -17,45 +17,45 @@ title: Submarine Launcher
 -->
 
 :::warning
-Please note that this design doc is working-in-progress and need more works to 
complete. 
+Please note that this design doc is working-in-progress and need more works to 
complete.
 :::
 
 ## Introduction
 Submarine is built and run in Cloud Native, taking advantage of the cloud 
computing model.
 
-To give full play to the advantages of cloud computing. 
-These applications are characterized by rapid and frequent build, release, and 
deployment. 
-Combined with the features of cloud computing, they are decoupled from the 
underlying hardware and operating system, 
+To give full play to the advantages of cloud computing.
+These applications are characterized by rapid and frequent build, release, and 
deployment.
+Combined with the features of cloud computing, they are decoupled from the 
underlying hardware and operating system,
 and can easily meet the requirements of scalability, availability, and 
portability. And provide better economy.
 
-In the enterprise data center, submarine can support k8s/yarn/docker three 
resource scheduling systems; 
+In the enterprise data center, submarine can support k8s/docker three resource 
scheduling systems;
 in the public cloud environment, submarine can support these cloud services in 
GCE/AWS/Azure;
 
 ## Requirement
 
 ### Cloud-Native Service
 
-The submarine server is a long-running services in the daemon mode. 
-The submarine server is mainly used by algorithm engineers to provide online 
front-end functions such as algorithm development, 
-algorithm debugging, data processing, and workflow scheduling. 
+The submarine server is a long-running services in the daemon mode.
+The submarine server is mainly used by algorithm engineers to provide online 
front-end functions such as algorithm development,
+algorithm debugging, data processing, and workflow scheduling.
 And submarine server also mainly used for back-end functions such as 
scheduling and execution of jobs, tracking of job status, and so on.
 
-Through the ability of rolling upgrades, we can better provide system 
stability. 
+Through the ability of rolling upgrades, we can better provide system 
stability.
 For example, we can upgrade or restart the workbench server without affecting 
the normal operation of submitted jobs.
 
 You can also make full use of system resources.
 For example, when the number of current developers or job tasks increases,
 The number of submarine server instances can be adjusted dynamically.
 
-In addition, submarine will provide each user with a completely independent 
workspace container. 
-This workspace container has already deployed the development tools and 
library files commonly used by algorithm engineers including their operating 
environment. 
+In addition, submarine will provide each user with a completely independent 
workspace container.
+This workspace container has already deployed the development tools and 
library files commonly used by algorithm engineers including their operating 
environment.
 Algorithm engineers can work in our prepared workspaces without any extra work.
 
 Each user's workspace can also be run through a cloud service.
 
 ### Service discovery
-With the cluster function of submarine, each service only needs to run in the 
container, 
-and it will automatically register the service in the submarine cluster 
center. 
+With the cluster function of submarine, each service only needs to run in the 
container,
+and it will automatically register the service in the submarine cluster center.
 Submarine cluster management will automatically maintain the relationship 
between service and service, service and user.
 
 ## Design
@@ -65,16 +65,16 @@ Submarine cluster management will automatically maintain 
the relationship betwee
 
 ### Launcher
 
-The submarine launcher module defines the complete interface. 
-By using this interface, you can run the submarine server, and workspace in 
k8s / yarn / docker / AWS / GCE / Azure.
+The submarine launcher module defines the complete interface.
+By using this interface, you can run the submarine server, and workspace in 
k8s / docker / AWS / GCE / Azure.
 
 
 ### Launcher On Docker
-In order to allow some small and medium-sized users without k8s/yarn to use 
submarine, 
+In order to allow some small and medium-sized users without k8s to use 
submarine,
 we support running the submarine system in docker mode.
 
-Users only need to provide several servers with docker runtime environment. 
-The submarine system can automatically cluster these servers into clusters, 
manage all the hardware resources of the cluster, 
+Users only need to provide several servers with docker runtime environment.
+The submarine system can automatically cluster these servers into clusters, 
manage all the hardware resources of the cluster,
 and run the service or workspace container in this cluster through scheduling 
algorithms.
 
 
@@ -82,9 +82,6 @@ and run the service or workspace container in this cluster 
through scheduling al
 
 submarine operator
 
-### Launcher On Yarn
-[TODO]
-
 ### Launcher On AWS
 [TODO]
 
diff --git a/website/docs/devDocs/README.md b/website/docs/devDocs/README.md
index d25ab35..407fd8e 100644
--- a/website/docs/devDocs/README.md
+++ b/website/docs/devDocs/README.md
@@ -25,7 +25,7 @@ This document mainly describes the structure of each module 
of the Submarine pro
 
 ### 2.1. submarine-client
 
-Provide the CLI interface for submarine user. (Currently only support YARN 
service)
+Provide the CLI interface for submarine user. (Currently only support YARN 
service (deprecated))
 
 ### 2.2. submarine-cloud-v2
 
@@ -45,7 +45,7 @@ Provide Python SDK for submarine user.
 
 ### 2.6. submarine-server
 
-Include core server, restful api, and k8s/yarn submitter.
+Include core server, restful api, and k8s submitter.
 
 ### 2.7. submarine-test
 

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[submarine] branch master updated: SUBMARINE-1118. Remove relevant yarn pages in the documentation

Reply via email to