This is an automated email from the ASF dual-hosted git repository.
kaihsun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/submarine.git
The following commit(s) were added to refs/heads/master by this push:
new 8da9f47 SUBMARINE-880. Cannot run two experiments with the same name
8da9f47 is described below
commit 8da9f478de9323dd098d06bbd52afdda3a27ce07
Author: Kai-Hsun Chen <[email protected]>
AuthorDate: Fri Jul 9 00:39:55 2021 +0800
SUBMARINE-880. Cannot run two experiments with the same name
### What is this PR for?
We cannot run two experiments with the same name. Take experimentIT.java as
an example, the frontend E2E testcase will create an experiment named
"experiment-e2e-test". If we run this testcase twice, the workbench just shows
an "experiment-e2e-test" rather than two.
The root cause is that the K8sSubmitter will create two PODs for an
"experiment-e2e-test". The names of these two PODs are
"experiment-e2e-test-ps-0" and "experiment-e2e-test-worker-0". Hence, when we
try to create the second "experiment-e2e-test", K8sSubmitter will throw an
exception due to duplicate POD names.
To reproduce the bug:
```bash
# Step1: Run workbench on port 8080
# Step2:
cd submarine-cloud-v2
# Step3: Create "experiment-e2e-test" twice
./hack/run_frontend_e2e.sh experimentIT
./hack/run_frontend_e2e.sh experimentIT
# Step4: Check Workbench: only an "experiment-e2e-test" exists => BUG!
```
* My solution: (Outdated)
* The variable "name" indicates the name of Custom Resource (ex: tfjobs,
pytorchjobs) in k8s.
* The variable "expName" indicates the name of experiment.
* ${name} = ${expName} + "-" + ${experimentCounter}
* For example: an user creates an experiment "experiment-e2e-test"
(${expName})
* On the workbench, the experiment name is "experiment-e2e-test".
(${expName})
* The name of the tfjobs (a CRD) is "experiment-e2e-test-0001".
(${name})
* The value of the label "submarine-experiment-name" in the TFJob is
"experiment-e2e-test" (${expName})
* The names of PODs created by the TFJob "experiment-e2e-test-0001" are:
```
experiment-e2e-test-0001-ps-0
experiment-e2e-test-0001-worker-0
```
### What type of PR is it?
[Bug Fix]
### Todos
* Check "_" (Reference:
[Link](https://blog.csdn.net/airangrong6572/article/details/101273245))
### What is the Jira issue?
https://issues.apache.org/jira/browse/SUBMARINE-880
### How should this be tested?
```bash
# Step1: Run workbench on port 8080
# Step2:
cd submarine-cloud-v2
# Step3: Create "experiment-e2e-test" twice
./hack/run_frontend_e2e.sh experimentIT
./hack/run_frontend_e2e.sh experimentIT
# Step4: Check Workbench: two"experiment-e2e-test" exists (Correct)
```
### Screenshots (if appropriate)
<img width="1141" alt="截圖 2021-07-08 上午1 58 10"
src="https://user-images.githubusercontent.com/20109646/124807637-969ba700-df90-11eb-95c4-17e9a1e32200.png">
<img width="438" alt="截圖 2021-07-08 上午1 59 43"
src="https://user-images.githubusercontent.com/20109646/124807642-97343d80-df90-11eb-8841-91cd59d93500.png">
<img width="383" alt="截圖 2021-07-08 上午1 57 49"
src="https://user-images.githubusercontent.com/20109646/124807649-98fe0100-df90-11eb-8393-e9ccbc4d0b10.png">
<img width="1439" alt="截圖 2021-07-08 上午1 58 00"
src="https://user-images.githubusercontent.com/20109646/124807660-9ac7c480-df90-11eb-9a08-f27d27ef0e1f.png">
### Questions:
* Do the license files need updating? No
* Are there breaking changes for older versions? No
* Does this need new documentation? No
Author: Kai-Hsun Chen <[email protected]>
Signed-off-by: Kai-Hsun Chen <[email protected]>
Closes #622 from kevin85421/SUBMARINE-880 and squashes the following
commits:
14fd0bc1 [Kai-Hsun Chen] Rename the functions
e82b29bc [Kai-Hsun Chen] Refactor
044317f0 [Kai-Hsun Chen] test
55369731 [Kai-Hsun Chen] Refactor
ce75ef9a [Kai-Hsun Chen] Use the experiment ID as tfjob name
038c7ae6 [Kai-Hsun Chen] Refactor
be562148 [Kai-Hsun Chen] Refactor
79126cc2 [Kai-Hsun Chen] import ExperimentMeta
d418f501 [Kai-Hsun Chen] Update
2e8421f3 [Kai-Hsun Chen] Try to figure out the root cause of integration
test errors
d985db5f [Kai-Hsun Chen] Separate the experiment name from POD name
---
.../submarine/server/api/spec/ExperimentMeta.java | 21 +++++++++++++++++++++
.../server/experiment/ExperimentManager.java | 5 +++--
.../submitter/k8s/parser/ExperimentSpecParser.java | 5 ++++-
.../server/submitter/k8s/util/MLJobConverter.java | 1 -
.../rest/ExperimentTemplateManagerRestApiIT.java | 2 +-
.../experiment-list/experiment-list.component.html | 2 +-
6 files changed, 30 insertions(+), 6 deletions(-)
diff --git
a/submarine-server/server-api/src/main/java/org/apache/submarine/server/api/spec/ExperimentMeta.java
b/submarine-server/server-api/src/main/java/org/apache/submarine/server/api/spec/ExperimentMeta.java
index d682740..7791b89 100644
---
a/submarine-server/server-api/src/main/java/org/apache/submarine/server/api/spec/ExperimentMeta.java
+++
b/submarine-server/server-api/src/main/java/org/apache/submarine/server/api/spec/ExperimentMeta.java
@@ -25,6 +25,10 @@ import java.util.Map;
* ExperimentMeta is metadata that all experiments must have.
*/
public class ExperimentMeta {
+
+ public static final String SUBMARINE_EXPERIMENT_NAME =
"submarine-experiment-name";
+
+ private String experimentId;
private String name;
private String namespace;
private String framework;
@@ -52,6 +56,22 @@ public class ExperimentMeta {
}
/**
+ * Get the experiment id which is unique within a namespace.
+ * @return experiment id
+ */
+ public String getExperimentId() {
+ return experimentId;
+ }
+
+ /**
+ * experiment id must be unique within a namespace. Is required when
creating experiment.
+ * @param experimentId experiment id
+ */
+ public void setExperimentId(String experimentId) {
+ this.experimentId = experimentId;
+ }
+
+ /**
* Get the namespace which defines the isolated space for each experiment.
* @return namespace
*/
@@ -136,6 +156,7 @@ public class ExperimentMeta {
public String toString() {
return "ExperimentMeta{" +
"name='" + name + '\'' +
+ ", experimentId='" + experimentId + '\'' +
", namespace='" + namespace + '\'' +
", framework='" + framework + '\'' +
", cmd='" + cmd + '\'' +
diff --git
a/submarine-server/server-core/src/main/java/org/apache/submarine/server/experiment/ExperimentManager.java
b/submarine-server/server-core/src/main/java/org/apache/submarine/server/experiment/ExperimentManager.java
index fe5757d..f61ff74 100644
---
a/submarine-server/server-core/src/main/java/org/apache/submarine/server/experiment/ExperimentManager.java
+++
b/submarine-server/server-core/src/main/java/org/apache/submarine/server/experiment/ExperimentManager.java
@@ -114,8 +114,10 @@ public class ExperimentManager {
spec.getMeta().getEnvVars().put(RestConstants.SUBMARINE_TRACKING_URI, url);
spec.getMeta().getEnvVars().put(RestConstants.LOG_DIR_KEY,
RestConstants.LOG_DIR_VALUE);
- String lowerName = spec.getMeta().getName().toLowerCase();
+ String lowerName = spec.getMeta().getName().toLowerCase();
spec.getMeta().setName(lowerName);
+ spec.getMeta().setExperimentId(id.toString().replaceAll("_", "-"));
+ LOG.info(spec.getMeta().getExperimentId());
Experiment experiment = submitter.createExperiment(spec);
experiment.setExperimentId(id);
@@ -125,7 +127,6 @@ public class ExperimentManager {
spec.getMeta().getEnvVars().remove(RestConstants.LOG_DIR_KEY);
experiment.setSpec(spec);
-
ExperimentEntity entity = buildEntityFromExperiment(experiment);
experimentService.insert(entity);
diff --git
a/submarine-server/server-submitter/submitter-k8s/src/main/java/org/apache/submarine/server/submitter/k8s/parser/ExperimentSpecParser.java
b/submarine-server/server-submitter/submitter-k8s/src/main/java/org/apache/submarine/server/submitter/k8s/parser/ExperimentSpecParser.java
index 0cdb550..2bc95bb 100644
---
a/submarine-server/server-submitter/submitter-k8s/src/main/java/org/apache/submarine/server/submitter/k8s/parser/ExperimentSpecParser.java
+++
b/submarine-server/server-submitter/submitter-k8s/src/main/java/org/apache/submarine/server/submitter/k8s/parser/ExperimentSpecParser.java
@@ -121,7 +121,10 @@ public class ExperimentSpecParser {
private static V1ObjectMeta parseMetadata(ExperimentSpec experimentSpec) {
V1ObjectMeta meta = new V1ObjectMeta();
- meta.setName(experimentSpec.getMeta().getName());
+ meta.setName(experimentSpec.getMeta().getExperimentId());
+ Map<String, String> labels = new HashMap<>();
+ labels.put(ExperimentMeta.SUBMARINE_EXPERIMENT_NAME,
experimentSpec.getMeta().getName());
+ meta.setLabels(labels);
meta.setNamespace(experimentSpec.getMeta().getNamespace());
return meta;
}
diff --git
a/submarine-server/server-submitter/submitter-k8s/src/main/java/org/apache/submarine/server/submitter/k8s/util/MLJobConverter.java
b/submarine-server/server-submitter/submitter-k8s/src/main/java/org/apache/submarine/server/submitter/k8s/util/MLJobConverter.java
index 4e077d2..99decf1 100644
---
a/submarine-server/server-submitter/submitter-k8s/src/main/java/org/apache/submarine/server/submitter/k8s/util/MLJobConverter.java
+++
b/submarine-server/server-submitter/submitter-k8s/src/main/java/org/apache/submarine/server/submitter/k8s/util/MLJobConverter.java
@@ -40,7 +40,6 @@ public class MLJobConverter {
Experiment experiment = new Experiment();
experiment.setUid(mlJob.getMetadata().getUid());
experiment.setName(mlJob.getMetadata().getName());
-
DateTime dateTime = mlJob.getMetadata().getCreationTimestamp();
if (dateTime != null) {
experiment.setAcceptedTime(dateTime.toString());
diff --git
a/submarine-test/test-k8s/src/test/java/org/apache/submarine/rest/ExperimentTemplateManagerRestApiIT.java
b/submarine-test/test-k8s/src/test/java/org/apache/submarine/rest/ExperimentTemplateManagerRestApiIT.java
index 3c4de92..b0d7b75 100644
---
a/submarine-test/test-k8s/src/test/java/org/apache/submarine/rest/ExperimentTemplateManagerRestApiIT.java
+++
b/submarine-test/test-k8s/src/test/java/org/apache/submarine/rest/ExperimentTemplateManagerRestApiIT.java
@@ -223,7 +223,7 @@ public class ExperimentTemplateManagerRestApiIT extends
AbstractSubmarineServerT
Assert.assertEquals(Response.Status.OK.getStatusCode(),
jsonResponse.getCode());
ExperimentSpec expSpec = experiment.getSpec();
-
+ LOG.info(expSpec.getMeta().toString());
Assert.assertEquals(tplSubmit.getParams().get(TPL_SUBMIT_NAME_PARM),
expSpec.getMeta().getName());
}
}
diff --git
a/submarine-workbench/workbench-web/src/app/pages/workbench/experiment/experiment-home/experiment-list/experiment-list.component.html
b/submarine-workbench/workbench-web/src/app/pages/workbench/experiment/experiment-home/experiment-list/experiment-list.component.html
index 1738b5d..4741281 100644
---
a/submarine-workbench/workbench-web/src/app/pages/workbench/experiment/experiment-home/experiment-list/experiment-list.component.html
+++
b/submarine-workbench/workbench-web/src/app/pages/workbench/experiment/experiment-home/experiment-list/experiment-list.component.html
@@ -45,7 +45,7 @@
<td>
<label nz-checkbox [(ngModel)]="checkedList[i]"></label>
</td>
- <td>{{ data.name }}</td>
+ <td>{{ data.spec.meta.name }}</td>
<td>
<a [routerLink]="['info', data.experimentId]">
{{ data.experimentId }}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]