[ https://issues.apache.org/jira/browse/BEAM-5026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Maximilian Michels closed BEAM-5026. ------------------------------------ Resolution: Fixed Assignee: Ankur Goenka > Portable flink wordcount fails sometimes due to non-existent source path in > FileBasedSink._check_state_for_finalize_write > ------------------------------------------------------------------------------------------------------------------------- > > Key: BEAM-5026 > URL: https://issues.apache.org/jira/browse/BEAM-5026 > Project: Beam > Issue Type: Bug > Components: examples-python, runner-flink, sdk-py-core > Affects Versions: 2.6.0 > Reporter: Ryan Williams > Assignee: Ankur Goenka > Priority: Minor > Fix For: 2.7.0 > > > Running portable flink wordcount locally: > In one terminal: > {code:java} > ./gradlew :beam-runners-flink_2.11-job-server:runShadow{code} > In another: > {code:java} > python -m apache_beam.examples.wordcount --harness_docker_image <image> > --input /etc/profile --output /tmp/py-wordcount-direct > --experiments=beam_fn_api --runner=PortableRunner > --job_endpoint=localhost:8099 --sdk_location=container{code} > Typically, the first time I run this for a given job-server instance, I see a > failure like this ([full > output|https://gist.github.com/ryan-williams/a96bf259898b6260cd4f00b8a232057c#file-gistfile1-txt-L3460]): > {code:java} > File "apache_beam/runners/common.py", line 661, in > apache_beam.runners.common._OutputProcessor.process_outputs > def process_outputs(self, windowed_input_element, results): > File "apache_beam/runners/common.py", line 676, in > apache_beam.runners.common._OutputProcessor.process_outputs > for result in results: > File "/usr/local/lib/python2.7/site-packages/apache_beam/io/iobase.py", line > 1074, in <genexpr> > return (window.TimestampedValue(v, window.MAX_TIMESTAMP) for v in outputs) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/io/filebasedsink.py", > line 271, in finalize_write > self._check_state_for_finalize_write(writer_results, num_shards)) > File > "/usr/local/lib/python2.7/site-packages/apache_beam/io/filebasedsink.py", > line 249, in _check_state_for_finalize_write > src, dst)) > BeamIOError: src and dst files do not exist. src: > /tmp/beam-temp-py-wordcount-direct-6a0d8862908c11e88de8025000000001/5cfa9f22-9246-41fb-adef-ca04d5a5fe50.py-wordcount-direct, > dst: /tmp/py-wordcount-direct-00000-of-00001 with exceptions None [while > running 'write/Write/WriteImpl/FinalizeWrite'] with exceptions None > {code} > This is after a fix to [a slightly earlier failure in {{FileBasedSink}} > documented on > BEAM-4742|https://issues.apache.org/jira/browse/BEAM-4742?focusedCommentId=16545622&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16545622] > which I've been working on in > [#5903|https://github.com/apache/beam/pull/5903]. > It typically occurs only on the first run of wordcount against a given > job-server instance. > I'm curious whether others see this, whether it's some race condition in the > FileBasedSink, LocalFileSystem, my macbook's disk, or somewhere else, or > whether some temporary directory is getting created on the first run (for > each job-server) that explains why subsequent wordcount runs succeed, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)