Fabian created BEAM-7266:
----------------------------
Summary: Pipeline run does not terminate because of Dataflow
runner can close file system writer
Key: BEAM-7266
URL: https://issues.apache.org/jira/browse/BEAM-7266
Project: Beam
Issue Type: Bug
Components: io-python-gcp, runner-dataflow
Affects Versions: 2.11.0
Reporter: Fabian
We are using Apache Beam in version 2.11.0 (Python SDK) with the Dataflow
runner running on the Google Cloud Platform. Two pipeline runs did not
terminate, i.e. after multiple days (instead of some minutes) they where still
running. The only error that was logged is:
If fails to close a writer:
{code:java}
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py",
line 649, in do_work
work_executor.execute()
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py",
line 178, in execute
op.finish()
File "dataflow_worker/native_operations.py", line 93, in
dataflow_worker.native_operations.NativeWriteOperation.finish
def finish(self):
File "dataflow_worker/native_operations.py", line 94, in
dataflow_worker.native_operations.NativeWriteOperation.finish
with self.scoped_finish_state:
File "dataflow_worker/native_operations.py", line 95, in
dataflow_worker.native_operations.NativeWriteOperation.finish
self.writer.__exit__(None, None, None)
File
"/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativeavroio.py", line
277, in __exit__
self._data_file_writer.close()
File "/usr/local/lib/python2.7/dist-packages/avro/datafile.py", line 220, in
close
self.writer.close()
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/filesystemio.py",
line 202, in close
self._uploader.finish()
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/gcsio.py",
line 606, in finish
raise self._upload_thread.last_error # pylint: disable=raising-bad-type
NotImplementedError{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)