Ryan Williams created BEAM-5026:
-----------------------------------

             Summary: Portable flink wordcount fails sometimes due to 
non-existent source path in FileBasedSink._check_state_for_finalize_write
                 Key: BEAM-5026
                 URL: https://issues.apache.org/jira/browse/BEAM-5026
             Project: Beam
          Issue Type: Bug
          Components: examples-python, runner-flink, sdk-py-core
    Affects Versions: 2.6.0
            Reporter: Ryan Williams
            Assignee: Ryan Williams


Running portable flink wordcount locally:

In one terminal:
{code:java}
./gradlew :beam-runners-flink_2.11-job-server:runShadow{code}
In another:
{code:java}
python -m apache_beam.examples.wordcount --harness_docker_image <image> --input 
/etc/profile --output /tmp/py-wordcount-direct --experiments=beam_fn_api 
--runner=PortableRunner --job_endpoint=localhost:8099 
--sdk_location=container{code}
Typically, the first time I run this for a given job-server instance, I see a 
failure like this ([full 
output|https://gist.github.com/ryan-williams/a96bf259898b6260cd4f00b8a232057c#file-gistfile1-txt-L3460]):
{code:java}
File "apache_beam/runners/common.py", line 661, in 
apache_beam.runners.common._OutputProcessor.process_outputs
def process_outputs(self, windowed_input_element, results):
File "apache_beam/runners/common.py", line 676, in 
apache_beam.runners.common._OutputProcessor.process_outputs
for result in results:
File "/usr/local/lib/python2.7/site-packages/apache_beam/io/iobase.py", line 
1074, in <genexpr>
return (window.TimestampedValue(v, window.MAX_TIMESTAMP) for v in outputs)
File "/usr/local/lib/python2.7/site-packages/apache_beam/io/filebasedsink.py", 
line 271, in finalize_write
self._check_state_for_finalize_write(writer_results, num_shards))
File "/usr/local/lib/python2.7/site-packages/apache_beam/io/filebasedsink.py", 
line 249, in _check_state_for_finalize_write
src, dst))
BeamIOError: src and dst files do not exist. src: 
/tmp/beam-temp-py-wordcount-direct-6a0d8862908c11e88de8025000000001/5cfa9f22-9246-41fb-adef-ca04d5a5fe50.py-wordcount-direct,
 dst: /tmp/py-wordcount-direct-00000-of-00001 with exceptions None [while 
running 'write/Write/WriteImpl/FinalizeWrite'] with exceptions None
{code}

This is after a fix to [a slightly earlier failure in {{FileBasedSink}} 
documented on 
BEAM-4742|https://issues.apache.org/jira/browse/BEAM-4742?focusedCommentId=16545622&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16545622]
 which I've been working on in [#5903|https://github.com/apache/beam/pull/5903].

It typically occurs only on the first run of wordcount against a given 
job-server instance.

I'm curious whether others see this, whether it's some race condition in the 
FileBasedSink, LocalFileSystem, my macbook's disk, or somewhere else, or 
whether some temporary directory is getting created on the first run (for each 
job-server) that explains why subsequent wordcount runs succeed, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to