ByronHsu opened a new pull request #484:
URL: https://github.com/apache/submarine/pull/484


   ### What is this PR for?
   
   Support new feature in 0.6.0: tensorboard integration.
   
   - Usage
       1. Create a job request that uses tensorboard
       2. Write tensorboard log to `/logs/mylog` (The subpath is required due 
to this issue in tensorflow 
[https://github.com/kubeflow/tf-operator/issues/1053](https://github.com/kubeflow/tf-operator/issues/1053).
 We cannot directly write log file to mountPath.)
       3. Link to `http://<host>:<ip>/tfboard-${job-name}`, and you can monitor 
the tensorboard with ease!
   - Implementation
   
       When creating a new job, the backend will not only create original 
experiment but also several k8s resources required in tensorboard
   
       The resources can be classified into two categories: 
   
       1. Storage
       2. Tensorboard serving
   
       **Storage**
   
       The resources required for storage are **persistent volume** and 
**persistent volume claim**.  
   
       I set the storage path of persistent volume on host path, and mount this 
path to MLjob (enable job to generate logs to volume) and Tensorboard (enable 
tfboard to access logs).
   
       **Tensorboard Serving**
   
       The resources required here are **deployments, service, and 
ingressroute**.
   
       I create the tensorboard apps with deployments and service, and then 
redirect it to custom path with the help of ingressroute.
   
   - Example
       - tensorboard-example.json
   
           ```bash
           {
             "meta": {
               "name": "tensorflow-dist-mnist-byron-1234",
               "namespace": "default",
               "framework": "TensorFlow",
               "cmd": "python /var/tf_mnist/mnist_with_summaries.py 
--log_dir=/logs/mylog --learning_rate=0.01 --batch_size=20",
               "envVars": {
                 "ENV_1": "ENV1"
               }
             },
             "environment": {
               "image": "apache/submarine:tf-mnist-with-summaries-1.0"
             },
             "spec": {
               "Worker": {
                 "replicas": 1,
                 "resources": "cpu=1,memory=1024M"
               }
             }
           }
           ```
    
           ![Kapture 2020-12-27 at 18 04 
40](https://user-images.githubusercontent.com/24364830/103168607-926b3000-486f-11eb-9f73-ecfcf71625a1.gif)
   
   
   ### What type of PR is it?
   [Feature]
   
   ### Todos
   - [ ] Frontend support
   - [ ] The logs of job cannot be written directly on the mountPath (As 
describe in above). We should fix this problem.
   - [ ] Make log path configurable (Currently, it is hard-coded as `/logs` )
   - [ ] Support smb-server for shared storage
   
   ### What is the Jira issue?
   https://issues.apache.org/jira/projects/SUBMARINE/issues/SUBMARINE-701
   
   ### How should this be tested?
   https://travis-ci.org/github/ByronHsu/submarine/jobs/751658488
   
   ### Questions:
   * Does the licenses files need update? No
   * Is there breaking changes for older versions? No
   * Does this needs documentation? No
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to