[ 
https://issues.apache.org/jira/browse/BEAM-8177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zaka Zaidan Azminur updated BEAM-8177:
--------------------------------------
    Description: 
I'm trying to create a simple test pipeline that export BigQuery as Parquet 
using BigQueryAvroUtils.java from Beam's code.

When trying to read the BigQuery data and read it as Avro Generic Record, 
somehow the code failed when trying to read the data with this exception
{code:java}
org.apache.avro.UnresolvedUnionException: Not in union 
["null",{"type":"record","name":"record","namespace":"Translated Avro Schema 
for 
record","doc":"org.apache.beam.sdk.io.gcp.bigquery","fields":[{"name":"key_2","type":["null","string"]},{"name":"key_1","type":["null","double"]}]}]:
 {"key_2": "asdasd", "key_1": 123123.123}
{code}
I have checked the Avro schema and it's the same with its BigQuery schema 
counterpart.

Then I tried to export the BigQuery table using BigQuery console as Avro and 
compare its schema with the one generated from BigQueryAvroUtils.java. Turns 
out there's some difference at the Avro namespace between 
BigQueryAvroUtils.java and from BigQuery export.

After I tried to patch the BigQueryAvroUtils.java to make the schema result the 
same with the schema from BigQuery export then the exception went away.

So, I want to confirm whether there's problem in my implementation or BigQuery 
create a slightly different Avro schema

I've created a simple code along with the patch and data sample 
[https://github.com/zakazai/bq-to-parquet]

 

  was:
I'm trying to create a simple test pipeline that export BigQuery as Parquet 
using BigQueryAvroUtils.java from Beam's code.

When trying to read the BigQuery data and read it as Avro Generic Record, 
somehow the code failed when trying to read the data with this exception
{code:java}
org.apache.avro.UnresolvedUnionException: Not in union 
["null",{"type":"record","name":"record","namespace":"Translated Avro Schema 
for 
record","doc":"org.apache.beam.sdk.io.gcp.bigquery","fields":[{"name":"key_2","type":["null","string"]},{"name":"key_1","type":["null","double"]}]}]:
 {"key_2": "asdasd", "key_1": 123123.123}
{code}
I have checked the Avro schema and it's the same with its BigQuery schema 
counterpart.

Then I tried to export the BigQuery table using BigQuery console as Avro and 
compare its schema with the one generated from BigQueryAvroUtils.java. Turns 
out there's some difference at the Avro namespace between 
BigQueryAvroUtils.java and from BigQuery export.

After I tried to patch the BigQueryAvroUtils.java to make the schema result the 
same with the schema from BigQuery export then the exception went away.

So, I want to confirm whether there's problem in my implementation or BigQuery 
create a slightly different Avro schema

I've created a simple code along with the patch and data sample 
[here|[https://github.com/zakazai/bq-to-parquet]] 

 


> BigQueryAvroUtils unable to convert field with record 
> ------------------------------------------------------
>
>                 Key: BEAM-8177
>                 URL: https://issues.apache.org/jira/browse/BEAM-8177
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-gcp
>    Affects Versions: 2.15.0
>            Reporter: Zaka Zaidan Azminur
>            Priority: Trivial
>
> I'm trying to create a simple test pipeline that export BigQuery as Parquet 
> using BigQueryAvroUtils.java from Beam's code.
> When trying to read the BigQuery data and read it as Avro Generic Record, 
> somehow the code failed when trying to read the data with this exception
> {code:java}
> org.apache.avro.UnresolvedUnionException: Not in union 
> ["null",{"type":"record","name":"record","namespace":"Translated Avro Schema 
> for 
> record","doc":"org.apache.beam.sdk.io.gcp.bigquery","fields":[{"name":"key_2","type":["null","string"]},{"name":"key_1","type":["null","double"]}]}]:
>  {"key_2": "asdasd", "key_1": 123123.123}
> {code}
> I have checked the Avro schema and it's the same with its BigQuery schema 
> counterpart.
> Then I tried to export the BigQuery table using BigQuery console as Avro and 
> compare its schema with the one generated from BigQueryAvroUtils.java. Turns 
> out there's some difference at the Avro namespace between 
> BigQueryAvroUtils.java and from BigQuery export.
> After I tried to patch the BigQueryAvroUtils.java to make the schema result 
> the same with the schema from BigQuery export then the exception went away.
> So, I want to confirm whether there's problem in my implementation or 
> BigQuery create a slightly different Avro schema
> I've created a simple code along with the patch and data sample 
> [https://github.com/zakazai/bq-to-parquet]
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to