Zdenko Hrcek created BEAM-7871:
----------------------------------

             Summary: Streaming from PubSub to Firestore doesn't work on 
Dataflow
                 Key: BEAM-7871
                 URL: https://issues.apache.org/jira/browse/BEAM-7871
             Project: Beam
          Issue Type: Bug
          Components: runner-dataflow
    Affects Versions: 2.13.0
            Reporter: Zdenko Hrcek


I came to the same error as here 
[https://stackoverflow.com/questions/57059944/python-package-errors-while-running-gcp-dataflow]
 but I don't see anywhere reported thus I am creating an issue just in case.

The pipeline is quite simple, reading from PubSub and writing to Firestore.

Beam version used is 2.13.0, Python 2.7

With DirectRunner works ok, but on Dataflow it throws the following message:

 
{code:java}
java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error 
received from SDK harness for instruction -81: Traceback (most recent call 
last):
 File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
 line 157, in _execute
 response = task()
 File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
 line 190, in <lambda>
 self._execute(lambda: worker.do_instruction(work), work)
 File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
 line 312, in do_instruction
 request.instruction_id)
 File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
 line 331, in process_bundle
 bundle_processor.process_bundle(instruction_id))
 File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py",
 line 538, in process_bundle
 op.start()
 File "apache_beam/runners/worker/operations.py", line 554, in 
apache_beam.runners.worker.operations.DoOperation.start
 def start(self):
 File "apache_beam/runners/worker/operations.py", line 555, in 
apache_beam.runners.worker.operations.DoOperation.start
 with self.scoped_start_state:
 File "apache_beam/runners/worker/operations.py", line 557, in 
apache_beam.runners.worker.operations.DoOperation.start
 self.dofn_runner.start()
 File "apache_beam/runners/common.py", line 778, in 
apache_beam.runners.common.DoFnRunner.start
 self._invoke_bundle_method(self.do_fn_invoker.invoke_start_bundle)
 File "apache_beam/runners/common.py", line 775, in 
apache_beam.runners.common.DoFnRunner._invoke_bundle_method
 self._reraise_augmented(exn)
 File "apache_beam/runners/common.py", line 800, in 
apache_beam.runners.common.DoFnRunner._reraise_augmented
 raise_with_traceback(new_exn)
 File "apache_beam/runners/common.py", line 773, in 
apache_beam.runners.common.DoFnRunner._invoke_bundle_method
 bundle_method()
 File "apache_beam/runners/common.py", line 359, in 
apache_beam.runners.common.DoFnInvoker.invoke_start_bundle
 def invoke_start_bundle(self):
 File "apache_beam/runners/common.py", line 363, in 
apache_beam.runners.common.DoFnInvoker.invoke_start_bundle
 self.signature.start_bundle_method.method_value())
 File "/home/zdenulo/dev/gcp_stuff/df_firestore_stream/df_firestore_stream.py", 
line 39, in start_bundle
NameError: global name 'firestore' is not defined [while running 
'generatedPtransform-64']
 
{code}
It's interesting that using Beam version 2.12.0 solves the problem on Dataflow, 
it works as expected, not sure what could be the problem.

Here is a repository with the code which was used 
[https://github.com/zdenulo/dataflow_firestore_stream]

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to