Forgot to add Xun in my last email.

On Thu, Nov 8, 2018 at 11:55 AM Wangda Tan <wheele...@gmail.com> wrote:

> Hi Robert,
>
> Submarine in 3.2.0 only support Docker container runtime, and in future
> releases (maybe 3.2.1), we plan to add support for non-docker containers.
>
> In order to try Submarine, you need to properly configure docker-on-yarn
> first.
>
> You can check
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/src/site/markdown/InstallationScriptEN.md
> for installation guide about how to properly setup Docker container on
> multiple containers. Submarine embedded an interactive shell to help you
> set up this should be straightforward. Added Xun Liu who is the original
> author for the installation interactive shell.
>
> Once you get Docker on YARN properly set up, you can follow
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/src/site/markdown/QuickStart.md
> to run the first application.
>
> Also, you can check Submarine slides to better understand how it works.
> See: https://www.dropbox.com/s/wuv19b3rt9k2kq6/submarine-v0.pptx?dl=0
>
> Any questions please don't hesitate to let us know.
>
> Thanks,
> Wangda
>
>
>
> On Thu, Nov 8, 2018 at 10:12 AM Robert Grandl <rgra...@yahoo.com.invalid>
> wrote:
>
>>  Thanks a lot for your reply.
>> Sunil,
>> I was trying to follow the steps from:
>> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/src/site/markdown/RunningDistributedCifar10TFJobs.md
>>
>> to run the tensorflow standalone using submarine. I have installed hadoop
>> 3.3.0-SNAPSHOT.
>> However, when I run the:yarn jar
>> path/to/hadoop-yarn-applications-submarine-3.2.0-SNAPSHOT.jar \
>>    job run --name tf-job-001 --verbose --docker_image
>> hadoopsubmarine/tf-1.8.0-gpu:0.0.1 \
>>    --input_path hdfs://default/dataset/cifar-10-data \
>>    --env DOCKER_JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/
>>    --env DOCKER_HADOOP_HDFS_HOME=/hadoop-3.1.0
>>    --num_workers 1 --worker_resources memory=8G,vcores=2,gpu=1 \
>>    --worker_launch_cmd "cd /test/models/tutorials/image/cifar10_estimator
>> && python cifar10_main.py --data-dir=%input_path%
>> --job-dir=%checkpoint_path% --train-steps=10000 --eval-batch-size=16
>> --train-batch-size=16 --num-gpus=2 --sync" \
>>    --tensorboard --tensorboard_docker_image wtan/tf-1.8.0-cpu:0.0.3
>> command, I get the following error:2018-11-07 21:48:55,831 INFO  [main]
>> client.AHSProxy (AHSProxy.java:createAHSProxy(42)) - Connecting to
>> Application History server at /128.105.144.236:10200Exception in thread
>> "main" java.lang.IllegalArgumentException: Unacceptable no of cpus
>> specified, either zero or negative for component master (or at the global
>> level)        at
>> org.apache.hadoop.yarn.service.utils.ServiceApiUtil.validateServiceResource(ServiceApiUtil.java:457)
>>       at
>> org.apache.hadoop.yarn.service.utils.ServiceApiUtil.validateComponent(ServiceApiUtil.java:306)
>>       at
>> org.apache.hadoop.yarn.service.utils.ServiceApiUtil.validateAndResolveService(ServiceApiUtil.java:237)
>>       at
>> org.apache.hadoop.yarn.service.client.ServiceClient.actionCreate(ServiceClient.java:496)
>>       at
>> org.apache.hadoop.yarn.submarine.runtimes.yarnservice.YarnServiceJobSubmitter.submitJob(YarnServiceJobSubmitter.java:542)
>>       at
>> org.apache.hadoop.yarn.submarine.client.cli.RunJobCli.run(RunJobCli.java:231)
>>       at org.apache.hadoop.yarn.submarine.client.cli.Cli.main(Cli.java:94)
>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>       at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>          at java.lang.reflect.Method.invoke(Method.java:498)        at
>> org.apache.hadoop.util.RunJar.run(RunJar.java:323)        at
>> org.apache.hadoop.util.RunJar.main(RunJar.java:236)
>>
>> It seems that I don't configure somewhere some corresponding resources
>> for a master component. However I have a hard time understanding where and
>> what to configure. I also looked at the design document you pointed at:
>> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit#heading=h.vkxp9edl11m7
>>
>> and it has a --master_resources flag. However this is not available in
>> 3.3.0.
>> Could you please advise how to proceed with this?
>> Thank you,- Robert
>>
>>     On Tuesday, November 6, 2018, 10:40:20 PM PST, Jonathan Hung <
>> jyhung2...@gmail.com> wrote:
>>
>>  Hi Robert, I also encourage you to check out
>> https://github.com/linkedin/TonY (TensorFlow on YARN) which is a
>> platform built for this purpose.
>>
>> Jonathan
>> ________________________________
>> From: Sunil G <sun...@apache.org>
>> Sent: Tuesday, November 6, 2018 10:05:14 PM
>> To: Robert Grandl
>> Cc: yarn-...@hadoop.apache.org; yarn-dev-h...@hadoop.apache.org; General
>> Subject: Re: Run Distributed TensorFlow on YARN
>>
>> Hi Robert
>>
>> {Submarine} project helps to run Distributed Tensorflow on top of YARN
>> with
>> ease. YARN-8220 <https://issues.apache.org/jira/browse/YARN-8220> was an
>> early attempt to do the same with some scripts etc, but Submarine will
>> help
>> to avoid all such custom scripts etc, and rather can simply run tensorflow
>> like a distributed shell command line by using Submarine jar. Pls refer
>> below doc for deep dive.
>>
>> https://docs.google.com/document/d/199J4pB3blqgV9SCNvBbTqkEoQdjoyGMjESV4MktCo0k/edit#heading=h.vkxp9edl11m7
>>
>> Submarine will be released as part of Hadoop 3.2.0 release which will be
>> out very soon officially (in coming weeks). you are free to use hadoop
>> trunk to run same if you need very soon.
>>
>> For now you can refer submarine docs under hadoop repo (trunk)
>> under
>> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/src/site/markdown/
>> or(
>>
>> https://github.com/apache/hadoop/tree/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/src/site/markdown
>> )
>>
>> Thanks
>> Sunil
>>
>>
>> On Wed, Nov 7, 2018 at 10:34 AM Robert Grandl <rgra...@yahoo.com.invalid>
>> wrote:
>>
>> >  Hi all,
>> > I am wondering if there is any stable support to run distributed
>> > TensorFlow atop YARN at the moment.
>> > I found this blog post from Hortonworks. It seems this it is possible
>> > starting YARN 3.1.0.
>> >
>> https://hortonworks.com/blog/distributed-tensorflow-assembly-hadoop-yarn/
>> >
>> >
>> > Also I found some more recent JIRAs:
>> > https://issues.apache.org/jira/browse/YARN-8220
>> > https://issues.apache.org/jira/browse/YARN-8135
>> > which suggests to use something called submarine.
>> >
>> > However, I could not find any proper documentation or instructions to
>> use
>> > any of these.
>> >
>> > Can someone help me with this?
>> > Otherwise, it is any better support to run any other machine learning
>> > framework with YARN?
>> > Thank you in advance,- Robert
>> >
>
>

Reply via email to