[
https://issues.apache.org/jira/browse/SUBMARINE-308?focusedWorklogId=370328&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-370328
]
ASF GitHub Bot logged work on SUBMARINE-308:
--------------------------------------------
Author: ASF GitHub Bot
Created on: 11/Jan/20 09:16
Start Date: 11/Jan/20 09:16
Worklog Time Spent: 10m
Work Description: asfgit commented on pull request #145: SUBMARINE-308.
Support PyTorch with TonY runtime
URL: https://github.com/apache/submarine/pull/145
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 370328)
Time Spent: 20m (was: 10m)
> Support PyTorch with TonY runtime
> ---------------------------------
>
> Key: SUBMARINE-308
> URL: https://issues.apache.org/jira/browse/SUBMARINE-308
> Project: Apache Submarine
> Issue Type: Sub-task
> Components: Backend Server, Doc
> Reporter: huiyangjian
> Assignee: Kevin Su
> Priority: Major
> Labels: pull-request-available
> Attachments: localization-error.jpg, no-pytorch.jpg,
> yarn-mount-error.jpg
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> 1.Hope to support stand-alone and distributed.
> 2.Hopefully, pytorch case will be available on submarine github.
> 3.Referring to the pytorch case in the documentation, the following script
> was run without success.
> {code:java}
> CLASSPATH=`hadoop classpath
> --glob`:/home/bin/hadoop/share/hadoop/yarn/submarine-all-0.3.0-SNAPSHOT-hadoop-3.2.jar
> \
> java org.apache.submarine.client.cli.Cli \
> job run --name ${APP_NAME} \
> --env DOCKER_JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/ \
> --env DOCKER_HADOOP_HDFS_HOME=/app/hadoop-3.2.1 \
> --env HADOOP_HOME=/hadoop-3.2.1 \
> --env HADOOP_YARN_HOME=/hadoop-3.2.1 \
> --env HADOOP_COMMON_HOME=/hadoop-3.2.1 \
> --env HADOOP_HDFS_HOME=/hadoop-3.2.1 \
> --env HADOOP_CONF_DIR=/hadoop-3.2.1/etc/hadoop \
> --env PYTHONUNBUFFERED="0" \
> --env TZ="Asia/Shanghai" \
> --env YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS=${LOCAL_PATH}/:/home/test \
> --queue dev \
> --input_path hdfs://cluster/user/work/tensorflow/data/ \
> --docker_image
> jx-bd-hadoop13.zeus.lianjia.com:801/runonce/tf-1.13.1-pytorch-0.4-gpu-base:0.0.1
> \
> --num_workers 1 \
> --worker_resources memory=16G,vcores=2,gpu=1 \
> --worker_launch_cmd "export CLASSPATH=\$(/app/hadoop-3.2.1/bin/hadoop
> classpath --glob) && cd /home/test/pth && python ../pth/train_pth.py" \
> --localization /home/local/test/cifar10_estimator:./submarine_algorithm
> --verbose \
> --conf
> tony.containers.resources=/home/bin/hadoop/share/hadoop/yarn/submarine-all-0.3.0-SNAPSHOT-hadoop-3.2.jar
> \
> --conf tony.application.framework=pytorch
> {code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]