svetakvsundhar commented on code in PR #17159:
URL: https://github.com/apache/beam/pull/17159#discussion_r870899342
##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -2525,6 +2526,12 @@ def _get_pipeline_details(unused_elm):
**self._kwargs))
| _PassThroughThenCleanupTempDatasets(project_to_cleanup_pcoll))
+ def get_pcoll_from_schema(table_schema):
+ pcoll_val = apache_beam.io.gcp.bigquery_schema_tools.\
+ produce_pcoll_with_schema(table_schema)
+ return beam.Map(lambda values: pcoll_val(**values)).with_output_types(
Review Comment:
Ah ok I think I've found the RC:
https://cloud.google.com/dataflow/docs/resources/faq#how_do_i_handle_nameerrors
Looks like the pickling is happen in my global namespace, and not carrying
over to the Dataflow worker. A fix such as passing in ```--save_main_session```
should do the trick. See https://issues.apache.org/jira/browse/BEAM-6158 for a
similar looking stacktrace that is tangentially related.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]