[GitHub] [beam] svetakvsundhar commented on a diff in pull request #17159: [WIP][BEAM-11587] Generate PColl element from TableSchema

GitBox Wed, 11 May 2022 19:52:06 -0700


svetakvsundhar commented on code in PR #17159:
URL: https://github.com/apache/beam/pull/17159#discussion_r870899342



##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -2525,6 +2526,12 @@ def _get_pipeline_details(unused_elm):
                 **self._kwargs))
         | _PassThroughThenCleanupTempDatasets(project_to_cleanup_pcoll))
 
+  def get_pcoll_from_schema(table_schema):
+    pcoll_val = apache_beam.io.gcp.bigquery_schema_tools.\
+        produce_pcoll_with_schema(table_schema)
+    return beam.Map(lambda values: pcoll_val(**values)).with_output_types(

Review Comment:
   Ah ok I think I've found the RC: 
https://cloud.google.com/dataflow/docs/resources/faq#how_do_i_handle_nameerrors
   
   Looks like the pickling is happen in my global namespace, and not carrying 
over to the Dataflow worker. A fix such as passing in ```--save_main_session``` 
should do the trick. See https://issues.apache.org/jira/browse/BEAM-6158 for a 
similar looking stacktrace that is tangentially related. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] svetakvsundhar commented on a diff in pull request #17159: [WIP][BEAM-11587] Generate PColl element from TableSchema

Reply via email to