This is an automated email from the ASF dual-hosted git repository.

kaihsun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/submarine.git


The following commit(s) were added to refs/heads/master by this push:
     new 8da9f47  SUBMARINE-880. Cannot run two experiments with the same name
8da9f47 is described below

commit 8da9f478de9323dd098d06bbd52afdda3a27ce07
Author: Kai-Hsun Chen <[email protected]>
AuthorDate: Fri Jul 9 00:39:55 2021 +0800

    SUBMARINE-880. Cannot run two experiments with the same name
    
    ### What is this PR for?
    We cannot run two experiments with the same name. Take experimentIT.java as 
an example, the frontend E2E testcase will create an experiment named 
"experiment-e2e-test". If we run this testcase twice, the workbench just shows 
an "experiment-e2e-test" rather than two.
    
    The root cause is that the K8sSubmitter will create two PODs for an 
"experiment-e2e-test". The names of these two PODs are 
"experiment-e2e-test-ps-0" and "experiment-e2e-test-worker-0". Hence, when we 
try to create the second "experiment-e2e-test", K8sSubmitter will throw an 
exception due to duplicate POD names.
    
    To reproduce the bug:
    ```bash
    # Step1: Run workbench on port 8080
    # Step2:
    cd submarine-cloud-v2
    
    # Step3: Create "experiment-e2e-test" twice
    ./hack/run_frontend_e2e.sh experimentIT
    ./hack/run_frontend_e2e.sh experimentIT
    
    # Step4: Check Workbench: only an "experiment-e2e-test" exists => BUG!
    ```
    
    * My solution: (Outdated)
      * The variable "name" indicates the name of Custom Resource (ex: tfjobs, 
pytorchjobs) in k8s.
      * The variable "expName" indicates the name of experiment.
      * ${name} = ${expName} + "-" + ${experimentCounter}
      * For example: an user creates an experiment "experiment-e2e-test"  
(${expName})
        * On the workbench, the experiment name is "experiment-e2e-test". 
(${expName})
        * The name of the tfjobs (a CRD) is "experiment-e2e-test-0001". 
(${name})
        * The value of the label "submarine-experiment-name" in the TFJob is 
"experiment-e2e-test"  (${expName})
        * The names of PODs created by the TFJob "experiment-e2e-test-0001" are:
          ```
          experiment-e2e-test-0001-ps-0
          experiment-e2e-test-0001-worker-0
          ```
    
    ### What type of PR is it?
    [Bug Fix]
    
    ### Todos
    * Check "_" (Reference: 
[Link](https://blog.csdn.net/airangrong6572/article/details/101273245))
    
    ### What is the Jira issue?
    https://issues.apache.org/jira/browse/SUBMARINE-880
    
    ### How should this be tested?
    ```bash
    # Step1: Run workbench on port 8080
    # Step2:
    cd submarine-cloud-v2
    
    # Step3: Create "experiment-e2e-test" twice
    ./hack/run_frontend_e2e.sh experimentIT
    ./hack/run_frontend_e2e.sh experimentIT
    
    # Step4: Check Workbench: two"experiment-e2e-test" exists (Correct)
    ```
    
    ### Screenshots (if appropriate)
    <img width="1141" alt="截圖 2021-07-08 上午1 58 10" 
src="https://user-images.githubusercontent.com/20109646/124807637-969ba700-df90-11eb-95c4-17e9a1e32200.png";>
    <img width="438" alt="截圖 2021-07-08 上午1 59 43" 
src="https://user-images.githubusercontent.com/20109646/124807642-97343d80-df90-11eb-8841-91cd59d93500.png";>
    <img width="383" alt="截圖 2021-07-08 上午1 57 49" 
src="https://user-images.githubusercontent.com/20109646/124807649-98fe0100-df90-11eb-8393-e9ccbc4d0b10.png";>
    <img width="1439" alt="截圖 2021-07-08 上午1 58 00" 
src="https://user-images.githubusercontent.com/20109646/124807660-9ac7c480-df90-11eb-9a08-f27d27ef0e1f.png";>
    
    ### Questions:
    * Do the license files need updating? No
    * Are there breaking changes for older versions? No
    * Does this need new documentation? No
    
    Author: Kai-Hsun Chen <[email protected]>
    
    Signed-off-by: Kai-Hsun Chen <[email protected]>
    
    Closes #622 from kevin85421/SUBMARINE-880 and squashes the following 
commits:
    
    14fd0bc1 [Kai-Hsun Chen] Rename the functions
    e82b29bc [Kai-Hsun Chen] Refactor
    044317f0 [Kai-Hsun Chen] test
    55369731 [Kai-Hsun Chen] Refactor
    ce75ef9a [Kai-Hsun Chen] Use the experiment ID as tfjob name
    038c7ae6 [Kai-Hsun Chen] Refactor
    be562148 [Kai-Hsun Chen] Refactor
    79126cc2 [Kai-Hsun Chen] import ExperimentMeta
    d418f501 [Kai-Hsun Chen] Update
    2e8421f3 [Kai-Hsun Chen] Try to figure out the root cause of integration 
test errors
    d985db5f [Kai-Hsun Chen] Separate the experiment name from POD name
---
 .../submarine/server/api/spec/ExperimentMeta.java   | 21 +++++++++++++++++++++
 .../server/experiment/ExperimentManager.java        |  5 +++--
 .../submitter/k8s/parser/ExperimentSpecParser.java  |  5 ++++-
 .../server/submitter/k8s/util/MLJobConverter.java   |  1 -
 .../rest/ExperimentTemplateManagerRestApiIT.java    |  2 +-
 .../experiment-list/experiment-list.component.html  |  2 +-
 6 files changed, 30 insertions(+), 6 deletions(-)

diff --git 
a/submarine-server/server-api/src/main/java/org/apache/submarine/server/api/spec/ExperimentMeta.java
 
b/submarine-server/server-api/src/main/java/org/apache/submarine/server/api/spec/ExperimentMeta.java
index d682740..7791b89 100644
--- 
a/submarine-server/server-api/src/main/java/org/apache/submarine/server/api/spec/ExperimentMeta.java
+++ 
b/submarine-server/server-api/src/main/java/org/apache/submarine/server/api/spec/ExperimentMeta.java
@@ -25,6 +25,10 @@ import java.util.Map;
  * ExperimentMeta is metadata that all experiments must have.
  */
 public class ExperimentMeta {
+
+  public static final String SUBMARINE_EXPERIMENT_NAME = 
"submarine-experiment-name";
+
+  private String experimentId;
   private String name;
   private String namespace;
   private String framework;
@@ -52,6 +56,22 @@ public class ExperimentMeta {
   }
 
   /**
+   * Get the experiment id which is unique within a namespace.
+   * @return experiment id
+   */
+  public String getExperimentId() {
+    return experimentId;
+  }
+
+  /**
+   * experiment id must be unique within a namespace. Is required when 
creating experiment.
+   * @param experimentId experiment id
+   */
+  public void setExperimentId(String experimentId) {
+    this.experimentId = experimentId;
+  }
+
+  /**
    * Get the namespace which defines the isolated space for each experiment.
    * @return namespace
    */
@@ -136,6 +156,7 @@ public class ExperimentMeta {
   public String toString() {
     return "ExperimentMeta{" +
       "name='" + name + '\'' +
+      ", experimentId='" + experimentId + '\'' +
       ", namespace='" + namespace + '\'' +
       ", framework='" + framework + '\'' +
       ", cmd='" + cmd + '\'' +
diff --git 
a/submarine-server/server-core/src/main/java/org/apache/submarine/server/experiment/ExperimentManager.java
 
b/submarine-server/server-core/src/main/java/org/apache/submarine/server/experiment/ExperimentManager.java
index fe5757d..f61ff74 100644
--- 
a/submarine-server/server-core/src/main/java/org/apache/submarine/server/experiment/ExperimentManager.java
+++ 
b/submarine-server/server-core/src/main/java/org/apache/submarine/server/experiment/ExperimentManager.java
@@ -114,8 +114,10 @@ public class ExperimentManager {
     spec.getMeta().getEnvVars().put(RestConstants.SUBMARINE_TRACKING_URI, url);
     spec.getMeta().getEnvVars().put(RestConstants.LOG_DIR_KEY, 
RestConstants.LOG_DIR_VALUE);
 
-    String lowerName = spec.getMeta().getName().toLowerCase();
+    String lowerName = spec.getMeta().getName().toLowerCase(); 
     spec.getMeta().setName(lowerName);
+    spec.getMeta().setExperimentId(id.toString().replaceAll("_", "-"));
+    LOG.info(spec.getMeta().getExperimentId());
 
     Experiment experiment = submitter.createExperiment(spec);
     experiment.setExperimentId(id);
@@ -125,7 +127,6 @@ public class ExperimentManager {
     spec.getMeta().getEnvVars().remove(RestConstants.LOG_DIR_KEY);
 
     experiment.setSpec(spec);
-
     ExperimentEntity entity = buildEntityFromExperiment(experiment);
     experimentService.insert(entity);
 
diff --git 
a/submarine-server/server-submitter/submitter-k8s/src/main/java/org/apache/submarine/server/submitter/k8s/parser/ExperimentSpecParser.java
 
b/submarine-server/server-submitter/submitter-k8s/src/main/java/org/apache/submarine/server/submitter/k8s/parser/ExperimentSpecParser.java
index 0cdb550..2bc95bb 100644
--- 
a/submarine-server/server-submitter/submitter-k8s/src/main/java/org/apache/submarine/server/submitter/k8s/parser/ExperimentSpecParser.java
+++ 
b/submarine-server/server-submitter/submitter-k8s/src/main/java/org/apache/submarine/server/submitter/k8s/parser/ExperimentSpecParser.java
@@ -121,7 +121,10 @@ public class ExperimentSpecParser {
 
   private static V1ObjectMeta parseMetadata(ExperimentSpec experimentSpec) {
     V1ObjectMeta meta = new V1ObjectMeta();
-    meta.setName(experimentSpec.getMeta().getName());
+    meta.setName(experimentSpec.getMeta().getExperimentId());
+    Map<String, String> labels = new HashMap<>();
+    labels.put(ExperimentMeta.SUBMARINE_EXPERIMENT_NAME, 
experimentSpec.getMeta().getName());
+    meta.setLabels(labels);
     meta.setNamespace(experimentSpec.getMeta().getNamespace());
     return meta;
   }
diff --git 
a/submarine-server/server-submitter/submitter-k8s/src/main/java/org/apache/submarine/server/submitter/k8s/util/MLJobConverter.java
 
b/submarine-server/server-submitter/submitter-k8s/src/main/java/org/apache/submarine/server/submitter/k8s/util/MLJobConverter.java
index 4e077d2..99decf1 100644
--- 
a/submarine-server/server-submitter/submitter-k8s/src/main/java/org/apache/submarine/server/submitter/k8s/util/MLJobConverter.java
+++ 
b/submarine-server/server-submitter/submitter-k8s/src/main/java/org/apache/submarine/server/submitter/k8s/util/MLJobConverter.java
@@ -40,7 +40,6 @@ public class MLJobConverter {
     Experiment experiment = new Experiment();
     experiment.setUid(mlJob.getMetadata().getUid());
     experiment.setName(mlJob.getMetadata().getName());
-
     DateTime dateTime = mlJob.getMetadata().getCreationTimestamp();
     if (dateTime != null) {
       experiment.setAcceptedTime(dateTime.toString());
diff --git 
a/submarine-test/test-k8s/src/test/java/org/apache/submarine/rest/ExperimentTemplateManagerRestApiIT.java
 
b/submarine-test/test-k8s/src/test/java/org/apache/submarine/rest/ExperimentTemplateManagerRestApiIT.java
index 3c4de92..b0d7b75 100644
--- 
a/submarine-test/test-k8s/src/test/java/org/apache/submarine/rest/ExperimentTemplateManagerRestApiIT.java
+++ 
b/submarine-test/test-k8s/src/test/java/org/apache/submarine/rest/ExperimentTemplateManagerRestApiIT.java
@@ -223,7 +223,7 @@ public class ExperimentTemplateManagerRestApiIT extends 
AbstractSubmarineServerT
     Assert.assertEquals(Response.Status.OK.getStatusCode(), 
jsonResponse.getCode());
     
     ExperimentSpec expSpec = experiment.getSpec();
-    
+    LOG.info(expSpec.getMeta().toString());
     Assert.assertEquals(tplSubmit.getParams().get(TPL_SUBMIT_NAME_PARM), 
expSpec.getMeta().getName());
   }
 }
diff --git 
a/submarine-workbench/workbench-web/src/app/pages/workbench/experiment/experiment-home/experiment-list/experiment-list.component.html
 
b/submarine-workbench/workbench-web/src/app/pages/workbench/experiment/experiment-home/experiment-list/experiment-list.component.html
index 1738b5d..4741281 100644
--- 
a/submarine-workbench/workbench-web/src/app/pages/workbench/experiment/experiment-home/experiment-list/experiment-list.component.html
+++ 
b/submarine-workbench/workbench-web/src/app/pages/workbench/experiment/experiment-home/experiment-list/experiment-list.component.html
@@ -45,7 +45,7 @@
       <td>
         <label nz-checkbox [(ngModel)]="checkedList[i]"></label>
       </td>
-      <td>{{ data.name }}</td>
+      <td>{{ data.spec.meta.name }}</td>
       <td>
         <a [routerLink]="['info', data.experimentId]">
           {{ data.experimentId }}

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to