ByronHsu opened a new pull request #483:
URL: https://github.com/apache/submarine/pull/483
### What is this PR for?
Support new feature in 0.6.0: tensorboard integration.
- Usage
1. Create a job request that uses tensorboard
2. Write tensorboard log to `/logs/mylog` (The subpath is required due
to this issue in tensorflow
[https://github.com/kubeflow/tf-operator/issues/1053](https://github.com/kubeflow/tf-operator/issues/1053).
We cannot directly Write log file to mountPath.)
3. Link to `http://<host>:<ip>/tfboard-${job-name}`, and you can monitor
the tensorboard with ease!
- Implementation
When creating a new job, the backend will not only create original
experiment but also several k8s resources required in tensorboard
The resources can be classified into two categories:
1. Storage
2. Tensorboard serving
**Storage**
The resources required for storage are **persistent volume** and
**persistent volume claim**.
I set the storage path of persistent volume on host path, and mount this
path to MLjob (enable job to generate logs to volume) and Tensorboard (enable
tfboard to access logs).
**Tensorboard Serving**
The resources required here are **deployments, service, and
ingressroute**.
I create the tensorboard apps with deployments and service, and then
redirect it to custom path with the help of ingressroute.
- Example
- tensorboard-example.json
```bash
{
"meta": {
"name": "tensorflow-dist-mnist-byron-1234",
"namespace": "default",
"framework": "TensorFlow",
"cmd": "python /var/tf_mnist/mnist_with_summaries.py
--log_dir=/logs/mylog --learning_rate=0.01 --batch_size=20",
"envVars": {
"ENV_1": "ENV1"
}
},
"environment": {
"image": "apache/submarine:tf-mnist-with-summaries-1.0"
},
"spec": {
"Worker": {
"replicas": 1,
"resources": "cpu=1,memory=1024M"
}
}
}
```

### What type of PR is it?
[Feature]
### Todos
- [] Frontend support
- [] The logs of job cannot be written directly on the mountPath (As
describe in above). We should fix this problem.
- [] Make log path configurable (Currently, it is hard-coded as `/logs` )
- [] Support smb-server for shared storage
### What is the Jira issue?
https://issues.apache.org/jira/projects/SUBMARINE/issues/SUBMARINE-701
### How should this be tested?
https://travis-ci.org/github/ByronHsu/submarine/jobs/751658488
### Questions:
* Does the licenses files need update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? No
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]