Hi,
We are building data pipeline using Beam Python SDK and trying to run on
Dataflow, but getting the below error,

*A setup error was detected in
beamapp-xxxxyyyy-0322102737-03220329-8a74-harness-lm6v. Please refer to the
worker-startup log for detailed information.*

But could not find detailed worker-startup logs.

We tried increasing memory size, worker count etc, but still getting the
same error.

Here is the command we use,
*python run.py \*
*--project=xyz \*
*--runner=DataflowRunner \*
*--staging_location=gs://xyz/staging \*
*--temp_location=gs://xyz/temp \*
*--requirements_file=requirements.txt \*
*--worker_machine_type n1-standard-8 \*
*--num_workers 2*


pipeline snippet

*data = pipeline | "load data" >> beam.io.Read(    *
*    beam.io.BigQuerySource(query="SELECT * FROM abc_table LIMIT 100")*
*)*

*data | "filter data" >> beam.Filter(lambda x: x.get('column_name') ==
value)*


Above pipeline is just loading the data from BigQuery and filtering based
on some column value. This pipeline works like a charm in DirectRunner but
fails on Dataflow.

Are we doing any obvious setup mistake? anyone else getting the same error?
We could use some help to resolve the issue.


-- 

*Rajesh Hegde | Lead Product Developer | Datalicious*
*e*: rhe...@datalicious.com | *m*: +919167571827
*a*: L-77, 15th Cross Rd, Sector 6, HSR Layout,
Bangalore Karnataka- 560102
*w*: www.datalicious.com
<http://www.datalicious.com/?utm_source=signaturesatori&utm_medium=email&utm_campaign=signaturesatori>

*Contact supp...@datalicious.com <supp...@datalicious.com> anytime, we're
keen to help!*

<https://www.linkedin.com/company/datalicious-pty-ltd>
<https://twitter.com/datalicious>   <https://www.facebook.com/Datalicious>
<https://plus.google.com/+Datalicious1>

<https://www.datalicious.com/resources/facebook-people-based-measurement-attribution/?utm_source=signaturesatori&utm_medium=email&utm_campaign=signaturesatori>

Reply via email to