TheNeuralBit commented on code in PR #17159:
URL: https://github.com/apache/beam/pull/17159#discussion_r931673923


##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -2422,6 +2422,9 @@ class ReadFromBigQuery(PTransform):
       to run queries with INTERACTIVE priority. This option is ignored when
       reading from a table rather than a query. To learn more about query
       priority, see: https://cloud.google.com/bigquery/docs/running-queries
+    output_type (str): By default, the schema returned from this transform
+      would be of type PYTHON_DICT. Other schema types can be specified
+      ("BEAM_ROW").

Review Comment:
   Thanks I appreciate the ideas here :)
   
   It's not as simple as just setting the coder, RowCoder is parameterized by 
the schema: 
https://github.com/apache/beam/blob/d51b497fb229d75eef8b7baee98cdb817a592a58/sdks/python/apache_beam/coders/row_coder.py#L52-L59
   
   
https://github.com/apache/beam/blob/d51b497fb229d75eef8b7baee98cdb817a592a58/model/pipeline/src/main/proto/org/apache/beam/model/pipeline/v1/beam_runner_api.proto#L1059
   
   So we need to be able to infer the schema at pipeline construction time in 
order to build one.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to