Re: Portable wordcount on Flink runner broken

Ruoyun Huang Mon, 19 Nov 2018 12:02:16 -0800

Unfortunately, flink server still doesn't work consistently on my machine
yet.  Funny thing is, it did worked ONCE (
:beam-sdks-python:portableWordCount BUILD successful, finished in 18s).
When I tried gain, things were back to hanging with server printing
messages like:


"""
[flink-akka.actor.default-dispatcher-25] DEBUG
org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Received
slot report from instance 1ad9060bcc87cf5fd19c9a233c15a18f.
[flink-akka.actor.default-dispatcher-25] DEBUG
org.apache.flink.runtime.jobmaster.JobMaster - Trigger heartbeat request.
[flink-akka.actor.default-dispatcher-23] DEBUG
org.apache.flink.runtime.taskexecutor.TaskExecutor - Received heartbeat
request from 006b3653dc7a24471c115d70c4c55fa6.
[flink-akka.actor.default-dispatcher-25] DEBUG
org.apache.flink.runtime.jobmaster.JobMaster - Received heartbeat from
e188c32c-cfa5-4b85-bda9-16ce4742c490.
...
repeat above forever after 5 minutes.
"""

I am trying to figure out what I did right for that one time succeeded run.


For the step 3 Thomas mentioned, all I did for cleanup is "gradle clean",
if there are actually more to do, please kindly let me know.




On Mon, Nov 19, 2018 at 6:00 AM Maximilian Michels <m...@apache.org> wrote:

> Thanks for investing, Thomas!
>
> Ruoyun, does that solve the WordCount problem you were experiencing?
>
> -Max
>
> On 19.11.18 04:53, Thomas Weise wrote:
> > With latest master the problem seems fixed. Unfortunately that was first
> > masked by build and docker issues. But I changed multiple things at once
> > after getting nowhere (the container build "succeeded" when in fact it
> > did not):
> >
> > * Update to latest docker
> > * Increase docker disk space after seeing a spurious, non-reproducible
> > message in one of the build attempts
> > * Full clean and manually remove Go build residuals from the workspace
> >
> > After that I could see Go and container builds execute differently
> > (longer build time) and the result certainly looks better..
> >
> > HTH,
> > Thomas
> >
> >
> >
> > On Sun, Nov 18, 2018 at 2:11 PM Ruoyun Huang <ruo...@google.com
> > <mailto:ruo...@google.com>> wrote:
> >
> >     I was after the same issue (I was using reference runner job server,
> >     but same error message), had some clue but no conclusion yet.
> >
> >     By retaining the container instance, error message says "bad MD5"
> >     (see the other thread [1] I asked in dev last week). My hypothesis,
> >     based on the symptoms, is that the underlying container expects an
> >     MD5 to validate staged files, but job request from python SDK does
> >     not send file hash code.  Hope someone can confirm if that is the
> >     case (I am still trying to understand how come dataflow does not
> >     have such issue), and if so, the best way to fix it.
> >
> >
> >     [1]
> >
> https://lists.apache.org/thread.html/b26560087ff88f142e26d66c8a5a9283558c8e55b5edd705b5e53c9c@%3Cdev.beam.apache.org%3E
> >
> >     On Fri, Nov 16, 2018 at 7:06 PM Thomas Weise <t...@apache.org
> >     <mailto:t...@apache.org>> wrote:
> >
> >         Since last few days, the steps under
> >         https://beam.apache.org/roadmap/portability/#python-on-flink are
> >         broken.
> >
> >         The gradle task hangs because the job server isn't able to
> >         launch the docker container.
> >
> >         ./gradlew :beam-sdks-python:portableWordCount
> >         -PjobEndpoint=localhost:8099
> >
> >         [CHAIN MapPartition (MapPartition at
> >
>  36write/Write/WriteImpl/DoOnce/Impulse.None/beam:env:docker:v1:0) ->
> >         FlatMap (FlatMap at
> >
>  36write/Write/WriteImpl/DoOnce/Impulse.None/beam:env:docker:v1:0/out.0)
> >         (8/8)] INFO
> >
>  org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory
> >         - Still waiting for startup of environment
> >         tweise-docker-apache.bintray.io/beam/python:latest
> >         <http://tweise-docker-apache.bintray.io/beam/python:latest> for
> >         worker id 1
> >
> >         Unfortunately this isn't covered by tests yet. Is anyone aware
> >         what change may have caused this or looking into resolving it?
> >
> >         Thanks,
> >         Thomas
> >
> >
> >
> >     --
> >     ================
> >     Ruoyun  Huang
> >
>


-- 
================
Ruoyun  Huang

Re: Portable wordcount on Flink runner broken

Reply via email to