[
https://issues.apache.org/jira/browse/BEAM-2595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16088222#comment-16088222
]
ASF GitHub Bot commented on BEAM-2595:
--------------------------------------
GitHub user sb2nov opened a pull request:
https://github.com/apache/beam/pull/3563
[BEAM-2595] Allow table schema objects in BQ DoFn
Be sure to do all of the following to help us incorporate your contribution
quickly and easily:
- [ ] Make sure the PR title is formatted like:
`[BEAM-<Jira issue #>] Description of pull request`
- [ ] Make sure tests pass via `mvn clean verify`.
- [ ] Replace `<Jira issue #>` in the title with the actual Jira issue
number, if there is one.
- [ ] If this contribution is large, please file an Apache
[Individual Contributor License
Agreement](https://www.apache.org/licenses/icla.pdf).
---
Cherry pick from master for BEAM-2535
R: @aaltay
cc @jbonofre
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sb2nov/beam BEAM-2595-cp
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/beam/pull/3563.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3563
----
commit ada4733b02bc38b1ef619fb991c068822a917595
Author: Sourabh Bajaj <[email protected]>
Date: 2017-07-13T19:02:31Z
[BEAM-2595] Allow table schema objects in BQ DoFn
----
> WriteToBigQuery does not work with nested json schema
> -----------------------------------------------------
>
> Key: BEAM-2595
> URL: https://issues.apache.org/jira/browse/BEAM-2595
> Project: Beam
> Issue Type: Bug
> Components: sdk-py
> Affects Versions: 2.1.0
> Environment: mac os local runner, Python
> Reporter: Andrea Pierleoni
> Assignee: Sourabh Bajaj
> Priority: Minor
> Labels: gcp
> Fix For: 2.1.0
>
>
> I am trying to use the new `WriteToBigQuery` PTransform added to
> `apache_beam.io.gcp.bigquery` in version 2.1.0-RC1
> I need to write to a bigquery table with nested fields.
> The only way to specify nested schemas in bigquery is with teh json schema.
> None of the classes in `apache_beam.io.gcp.bigquery` are able to parse the
> json schema, but they accept a schema as an instance of the class
> `apache_beam.io.gcp.internal.clients.bigquery.TableFieldSchema`
> I am composing the `TableFieldSchema` as suggested here
> [https://stackoverflow.com/questions/36127537/json-table-schema-to-bigquery-tableschema-for-bigquerysink/45039436#45039436],
> and it looks fine when passed to the PTransform `WriteToBigQuery`.
> The problem is that the base class `PTransformWithSideInputs` try to pickle
> and unpickle the function
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/ptransform.py#L515]
> (that includes the TableFieldSchema instance) and for some reason when the
> class is unpickled some `FieldList` instance are converted to simple lists,
> and the pickling validation fails.
> Would it be possible to extend the test coverage to nested json objects for
> bigquery?
> They are also relatively easy to parse into a TableFieldSchema.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)