Hi Ruoyun,
The output file will be within the container which is deleted after
shutdown by default. You can keep the containers if you add the flag
--retain_docker_containers
Note, this is from ManualDockerEnvironmentOptions.
The problem with batch is that it executes staged and will create
multiple containers [1] which don't share the same local file system. So
the wordcount only works reliably if you use a distributed file system.
Cheers,
Max
[1] You can prevent multiple containers by using
--environment_cache_millis=10000
On 14.11.18 20:44, Ruoyun Huang wrote:
Thanks Thomas!
My desktop runs Linux. I was using gradle to run wordcount, and that
was how I got the job hanging. Since both of you get it working, I guess
more likely sth is wrong with my setup.
By using Thmoas's python command line exactly as is, I am able to see
the job run succeeds, however two questions:
1) Did you check whether output file "/tmp/py-wordcount-direct" exists
or not? I expect there should be a text output, but I don't see this
file afterwards. (I am still in the stage building confidence in
telling what a succeeded run is. Maybe I will try DataflowRunner and
cross check outputs).
2) Why it needs a "--streaming" arg? Isn't this a static batch input,
by feeding a txt file input? In fact, I got failure message if I remove
'--streaming', not sure if it is due to my setup again.
On Wed, Nov 14, 2018 at 7:51 AM Thomas Weise <t...@apache.org
<mailto:t...@apache.org>> wrote:
Works for me on macOS as well.
In case you don't launch the pipeline through Gradle, this would be
the command:
python -m apache_beam.examples.wordcount \
--input=/etc/profile \
--output=/tmp/py-wordcount-direct \
--runner=PortableRunner \
--job_endpoint=localhost:8099 \
--parallelism=1 \
--OPTIONALflink_master=localhost:8081 \
--streaming
We talked about adding the wordcount to pre-commit..
Regarding using ULR vs. Flink runner: There seems to be confusion
between PortableRunner using the user supplied endpoint vs. trying
to launch a job server. I commented in the doc.
Thomas
On Wed, Nov 14, 2018 at 3:30 AM Maximilian Michels <m...@apache.org
<mailto:m...@apache.org>> wrote:
Hi Ruoyun,
I just ran the wordcount locally using the instructions on the
page.
I've tried the local file system and GCS. Both times it ran
successfully
and produced valid output.
I'm assuming there is some problem with your setup. Which
platform are
you using? I'm on MacOS.
Could you expand on the planned merge? From my understanding we
will
always need PortableRunner in Python to be able to submit
against the
Beam JobServer.
Thanks,
Max
On 14.11.18 00:39, Ruoyun Huang wrote:
> A quick follow-up on using current PortableRunner.
>
> I followed the exact three steps as Ankur and Maximilian
shared in
> https://beam.apache.org/roadmap/portability/#python-on-flink
; The
> wordcount example keeps hanging after 10 minutes. I also tried
> specifying explicit input/output args, either using gcs
folder or local
> file system, but none of them works.
>
> Spent some time looking into it but conclusion yet. At this
point
> though, I guess it does not matter much any more, given we
already have
> the plan of merging PortableRunner into using java reference
runner
> (i.e. :beam-runners-reference-job-server).
>
> Still appreciated if someone can try out the python-on-flink
>
<https://beam.apache.org/roadmap/portability/#python-on-flink>instructions
> in case it is just due to my local machine setup. Thanks!
>
>
>
> On Thu, Nov 8, 2018 at 5:04 PM Ruoyun Huang
<ruo...@google.com <mailto:ruo...@google.com>
> <mailto:ruo...@google.com <mailto:ruo...@google.com>>> wrote:
>
> Thanks Maximilian!
>
> I am working on migrating existing PortableRunner to
using java ULR
> (Link to Notes
>
<https://docs.google.com/document/d/1S86saZqiDaE_M5wxO0zOQ_rwC6QHv7sp1BmGTm0dLNE/edit#>).
> If this issue is non-trivial to solve, I would vote for
removing
> this default behavior as part of the consolidation.
>
> On Thu, Nov 8, 2018 at 2:58 AM Maximilian Michels
<m...@apache.org <mailto:m...@apache.org>
> <mailto:m...@apache.org <mailto:m...@apache.org>>> wrote:
>
> In the long run, we should get rid of the
Docker-inside-Docker
> approach,
> which was only intended for testing anyways. It would
be cleaner to
> start the SDK harness container alongside with
JobServer container.
>
> Short term, I think it should be easy to either fix the
> permissions of
> the mounted "docker" executable or use a Docker image
for the
> JobServer
> which comes with Docker pre-installed.
>
> JIRA: https://issues.apache.org/jira/browse/BEAM-6020
>
> Thanks for reporting this Ruoyun!
>
> -Max
>
> On 08.11.18 00:10, Ruoyun Huang wrote:
> > Thanks Ankur and Maximilian.
> >
> > Just for reference in case other people
encountering the same
> error
> > message, the "permission denied" error in my
original email
> is exactly
> > due to dockerinsidedocker issue that Ankur mentioned.
> Thanks Ankur!
> > Didn't make the link when you said it, had to
discover that
> in a hard
> > way (I thought it is due to my docker installation
messed up).
> >
> > On Tue, Nov 6, 2018 at 1:53 AM Maximilian Michels
> <m...@apache.org <mailto:m...@apache.org>
<mailto:m...@apache.org <mailto:m...@apache.org>>
> > <mailto:m...@apache.org <mailto:m...@apache.org>
<mailto:m...@apache.org <mailto:m...@apache.org>>>> wrote:
> >
> > Hi,
> >
> > Please follow
> >
https://beam.apache.org/roadmap/portability/#python-on-flink
> >
> > Cheers,
> > Max
> >
> > On 06.11.18 01:14, Ankur Goenka wrote:
> > > Hi,
> > >
> > > The Portable Runner requires a job server
uri to work
> with. The
> > current
> > > default job server docker image is broken
because of
> docker inside
> > > docker issue.
> > >
> > > Please refer to
> > >
> https://beam.apache.org/roadmap/portability/#python-on-flink for
> > how to
> > > run a wordcount using Portable Flink Runner.
> > >
> > > Thanks,
> > > Ankur
> > >
> > > On Mon, Nov 5, 2018 at 3:41 PM Ruoyun Huang
> <ruo...@google.com <mailto:ruo...@google.com>
<mailto:ruo...@google.com <mailto:ruo...@google.com>>
> > <mailto:ruo...@google.com
<mailto:ruo...@google.com> <mailto:ruo...@google.com
<mailto:ruo...@google.com>>>
> > > <mailto:ruo...@google.com
<mailto:ruo...@google.com> <mailto:ruo...@google.com
<mailto:ruo...@google.com>>
> <mailto:ruo...@google.com <mailto:ruo...@google.com>
<mailto:ruo...@google.com <mailto:ruo...@google.com>>>>> wrote:
> > >
> > > Hi, Folks,
> > >
> > > I want to try out Python
PortableRunner, by
> using following
> > > command:
> > >
> > > *sdk/python: python -m
apache_beam.examples.wordcount
> > > --output=/tmp/test_output --runner
PortableRunner*
> > >
> > > It complains with following error
message:
> > >
> > > Caused by: java.lang.Exception: The
user defined
> 'open()' method
> > > caused an exception:
java.io.IOException: Cannot
> run program
> > > "docker": error=13, Permission denied
> > > at
> >
>
org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:498)
> > > at
> > >
> >
>
org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
> > > at
>
org.apache.flink.runtime.taskmanager.Task.run(Task.java:712)
> > > ... 1 more
> > > Caused by:
> > >
> >
>
org.apache.beam.repackaged.beam_runners_java_fn_execution.com.google.common.util.concurrent.UncheckedExecutionException:
> > > java.io.IOException: Cannot run program
"docker":
> error=13,
> > > Permission denied
> > > at
> > >
> >
>
org.apache.beam.repackaged.beam_runners_java_fn_execution.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4994)
> > >
> > > ... 7 more
> > >
> > >
> > >
> > > My py2 environment is properly
configured, because
> DirectRunner
> > > works. Also I tested my docker
installation by
> 'docker run
> > > hello-world ', no issue.
> > >
> > >
> > > Thanks.
> > > --
> > > ================
> > > Ruoyun Huang
> > >
> >
> >
> >
> > --
> > ================
> > Ruoyun Huang
> >
>
>
>
> --
> ================
> Ruoyun Huang
>
>
>
> --
> ================
> Ruoyun Huang
>
--
================
Ruoyun Huang