[ 
https://issues.apache.org/jira/browse/BEAM-2595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083300#comment-16083300
 ] 

Ahmet Altay commented on BEAM-2595:
-----------------------------------

[~andrea.pierleoni] Thank you for reporting this. Could you share the error you 
are getting?

[~sb2nov] Could you verify whether this is a regression or not? If this is a 
regression, can we mitigate before (add a comment/document to use the old way) 
before the release goes out?

In addition to fix, I agree that we need a test if we don't have one. And also 
update examples (e.g. 
https://github.com/apache/beam/blob/91c7d3d1f7d72e84e773c1adbffed063aefdff3b/sdks/python/apache_beam/examples/cookbook/bigquery_schema.py#L116)

cc: [~chamikara]


> WriteToBigQuery does not work with nested json schema
> -----------------------------------------------------
>
>                 Key: BEAM-2595
>                 URL: https://issues.apache.org/jira/browse/BEAM-2595
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py
>    Affects Versions: 2.1.0
>         Environment: mac os local runner, Python
>            Reporter: Andrea Pierleoni
>            Assignee: Sourabh Bajaj
>            Priority: Minor
>              Labels: gcp
>             Fix For: 2.1.0
>
>
> I am trying to use the new `WriteToBigQuery` PTransform added to 
> `apache_beam.io.gcp.bigquery` in version 2.1.0-RC1
> I need to write to a bigquery table with nested fields.
> The only way to specify nested schemas in bigquery is with teh json schema.
> None of the classes in `apache_beam.io.gcp.bigquery` are able to parse the 
> json schema, but they accept a schema as an instance of the class 
> `apache_beam.io.gcp.internal.clients.bigquery.TableFieldSchema`
> I am composing the `TableFieldSchema` as suggested here 
> [https://stackoverflow.com/questions/36127537/json-table-schema-to-bigquery-tableschema-for-bigquerysink/45039436#45039436],
>  and it looks fine when passed to the PTransform `WriteToBigQuery`. 
> The problem is that the base class `PTransformWithSideInputs` try to pickle 
> and unpickle the function 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/ptransform.py#L515]
>   (that includes the TableFieldSchema instance) and for some reason when the 
> class is unpickled some `FieldList` instance are converted to simple lists, 
> and the pickling validation fails.
> Would it be possible to extend the test coverage to nested json objects for 
> bigquery?
> They are also relatively easy to parse into a TableFieldSchema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to