Thanks Thomas! My desktop runs Linux. I was using gradle to run wordcount, and that was how I got the job hanging. Since both of you get it working, I guess more likely sth is wrong with my setup.
By using Thmoas's python command line exactly as is, I am able to see the job run succeeds, however two questions: 1) Did you check whether output file "/tmp/py-wordcount-direct" exists or not? I expect there should be a text output, but I don't see this file afterwards. (I am still in the stage building confidence in telling what a succeeded run is. Maybe I will try DataflowRunner and cross check outputs). 2) Why it needs a "--streaming" arg? Isn't this a static batch input, by feeding a txt file input? In fact, I got failure message if I remove '--streaming', not sure if it is due to my setup again. On Wed, Nov 14, 2018 at 7:51 AM Thomas Weise <t...@apache.org> wrote: > Works for me on macOS as well. > > In case you don't launch the pipeline through Gradle, this would be the > command: > > python -m apache_beam.examples.wordcount \ > --input=/etc/profile \ > --output=/tmp/py-wordcount-direct \ > --runner=PortableRunner \ > --job_endpoint=localhost:8099 \ > --parallelism=1 \ > --OPTIONALflink_master=localhost:8081 \ > --streaming > > We talked about adding the wordcount to pre-commit.. > > Regarding using ULR vs. Flink runner: There seems to be confusion between > PortableRunner using the user supplied endpoint vs. trying to launch a job > server. I commented in the doc. > > Thomas > > > > On Wed, Nov 14, 2018 at 3:30 AM Maximilian Michels <m...@apache.org> wrote: > >> Hi Ruoyun, >> >> I just ran the wordcount locally using the instructions on the page. >> I've tried the local file system and GCS. Both times it ran successfully >> and produced valid output. >> >> I'm assuming there is some problem with your setup. Which platform are >> you using? I'm on MacOS. >> >> Could you expand on the planned merge? From my understanding we will >> always need PortableRunner in Python to be able to submit against the >> Beam JobServer. >> >> Thanks, >> Max >> >> On 14.11.18 00:39, Ruoyun Huang wrote: >> > A quick follow-up on using current PortableRunner. >> > >> > I followed the exact three steps as Ankur and Maximilian shared in >> > https://beam.apache.org/roadmap/portability/#python-on-flink ; The >> > wordcount example keeps hanging after 10 minutes. I also tried >> > specifying explicit input/output args, either using gcs folder or local >> > file system, but none of them works. >> > >> > Spent some time looking into it but conclusion yet. At this point >> > though, I guess it does not matter much any more, given we already have >> > the plan of merging PortableRunner into using java reference runner >> > (i.e. :beam-runners-reference-job-server). >> > >> > Still appreciated if someone can try out the python-on-flink >> > <https://beam.apache.org/roadmap/portability/#python-on-flink>instructions >> >> > in case it is just due to my local machine setup. Thanks! >> > >> > >> > >> > On Thu, Nov 8, 2018 at 5:04 PM Ruoyun Huang <ruo...@google.com >> > <mailto:ruo...@google.com>> wrote: >> > >> > Thanks Maximilian! >> > >> > I am working on migrating existing PortableRunner to using java ULR >> > (Link to Notes >> > < >> https://docs.google.com/document/d/1S86saZqiDaE_M5wxO0zOQ_rwC6QHv7sp1BmGTm0dLNE/edit# >> >). >> > If this issue is non-trivial to solve, I would vote for removing >> > this default behavior as part of the consolidation. >> > >> > On Thu, Nov 8, 2018 at 2:58 AM Maximilian Michels <m...@apache.org >> > <mailto:m...@apache.org>> wrote: >> > >> > In the long run, we should get rid of the Docker-inside-Docker >> > approach, >> > which was only intended for testing anyways. It would be >> cleaner to >> > start the SDK harness container alongside with JobServer >> container. >> > >> > Short term, I think it should be easy to either fix the >> > permissions of >> > the mounted "docker" executable or use a Docker image for the >> > JobServer >> > which comes with Docker pre-installed. >> > >> > JIRA: https://issues.apache.org/jira/browse/BEAM-6020 >> > >> > Thanks for reporting this Ruoyun! >> > >> > -Max >> > >> > On 08.11.18 00:10, Ruoyun Huang wrote: >> > > Thanks Ankur and Maximilian. >> > > >> > > Just for reference in case other people encountering the same >> > error >> > > message, the "permission denied" error in my original email >> > is exactly >> > > due to dockerinsidedocker issue that Ankur mentioned. >> > Thanks Ankur! >> > > Didn't make the link when you said it, had to discover that >> > in a hard >> > > way (I thought it is due to my docker installation messed >> up). >> > > >> > > On Tue, Nov 6, 2018 at 1:53 AM Maximilian Michels >> > <m...@apache.org <mailto:m...@apache.org> >> > > <mailto:m...@apache.org <mailto:m...@apache.org>>> wrote: >> > > >> > > Hi, >> > > >> > > Please follow >> > > https://beam.apache.org/roadmap/portability/#python-on-flink >> > > >> > > Cheers, >> > > Max >> > > >> > > On 06.11.18 01:14, Ankur Goenka wrote: >> > > > Hi, >> > > > >> > > > The Portable Runner requires a job server uri to work >> > with. The >> > > current >> > > > default job server docker image is broken because of >> > docker inside >> > > > docker issue. >> > > > >> > > > Please refer to >> > > > >> > https://beam.apache.org/roadmap/portability/#python-on-flink >> for >> > > how to >> > > > run a wordcount using Portable Flink Runner. >> > > > >> > > > Thanks, >> > > > Ankur >> > > > >> > > > On Mon, Nov 5, 2018 at 3:41 PM Ruoyun Huang >> > <ruo...@google.com <mailto:ruo...@google.com> >> > > <mailto:ruo...@google.com <mailto:ruo...@google.com>> >> > > > <mailto:ruo...@google.com <mailto:ruo...@google.com> >> > <mailto:ruo...@google.com <mailto:ruo...@google.com>>>> wrote: >> > > > >> > > > Hi, Folks, >> > > > >> > > > I want to try out Python PortableRunner, by >> > using following >> > > > command: >> > > > >> > > > *sdk/python: python -m >> apache_beam.examples.wordcount >> > > > --output=/tmp/test_output --runner >> PortableRunner* >> > > > >> > > > It complains with following error message: >> > > > >> > > > Caused by: java.lang.Exception: The user defined >> > 'open()' method >> > > > caused an exception: java.io.IOException: Cannot >> > run program >> > > > "docker": error=13, Permission denied >> > > > at >> > > >> > >> org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:498) >> > > > at >> > > > >> > > >> > >> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368) >> > > > at >> > org.apache.flink.runtime.taskmanager.Task.run(Task.java:712) >> > > > ... 1 more >> > > > Caused by: >> > > > >> > > >> > >> >> org.apache.beam.repackaged.beam_runners_java_fn_execution.com.google.common.util.concurrent.UncheckedExecutionException: >> > > > java.io.IOException: Cannot run program "docker": >> > error=13, >> > > > Permission denied >> > > > at >> > > > >> > > >> > >> >> org.apache.beam.repackaged.beam_runners_java_fn_execution.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4994) >> > > > >> > > > ... 7 more >> > > > >> > > > >> > > > >> > > > My py2 environment is properly configured, because >> > DirectRunner >> > > > works. Also I tested my docker installation by >> > 'docker run >> > > > hello-world ', no issue. >> > > > >> > > > >> > > > Thanks. >> > > > -- >> > > > ================ >> > > > Ruoyun Huang >> > > > >> > > >> > > >> > > >> > > -- >> > > ================ >> > > Ruoyun Huang >> > > >> > >> > >> > >> > -- >> > ================ >> > Ruoyun Huang >> > >> > >> > >> > -- >> > ================ >> > Ruoyun Huang >> > >> > -- ================ Ruoyun Huang