[jira] [Comment Edited] (BEAM-9613) BigQuery IO not support convert double type for beam row

Kenneth Knowles (Jira) Fri, 24 Apr 2020 11:38:39 -0700


    [ 
https://issues.apache.org/jira/browse/BEAM-9613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091820#comment-17091820
 ]


Kenneth Knowles edited comment on BEAM-9613 at 4/24/20, 6:37 PM:
-----------------------------------------------------------------

JSON is a textual format in the same was as any programming language literal. 
The text is an encoding. We don't usually think about programs and values in 
programs as text strings, but you can if you want... The value and type of a 
JSON value is unambiguous.
 - {{The JSON value \{ pi: 3.14159 } has a single field "pi" and its value is 
the number {{3.14159}}}}
 - {{The JSON value \{ pi: "3.14.159" } has a single field "pi" and its value 
is the string {{"3.14159"}}}}

JSON has one number type (specified to be arbitrary precision IIRC) not 
separate integer and float types.

These are distinct values. A TableRow for a BQ table with a FLOAT64 field may 
choose one or the other (likely), export both in different contexts (yuck), 
accept both and fuzzily do best-effort coercion on writes, etc.

What we need is something like 
[https://cloud.google.com/bigquery/docs/exporting-data#avro_export_details] for 
the JSON format, and to make sure we are consistent between our treatment of 
the Avro and JSON format. It seems that when we read the Avro and then convert 
to TableRow, we end up with a different TableRow than we would if we did a JSON 
export and read it in, or read the row directly as a TableRow.


was (Author: kenn):
JSON is a textual format in the same was as any programming language literal. 
The text is an encoding. We don't usually think about programs and values in 
programs as text strings, but you can if you want... The value and type of a 
JSON value is unambiguous.

 - The JSON value { pi: 3.14159 \} has a single field "pi" and its value is the 
number {{3.14159}}
 - The JSON value { pi: "3.14.159 \} has a single field "pi" and its value is 
the string {{"3.14159"}}

JSON has one number type (specified to be arbitrary precision IIRC) not 
separate integer and float types.

These are distinct values. A TableRow for a BQ table with a FLOAT64 field may 
choose one or the other (likely), export both in different contexts (yuck), 
accept both and fuzzily do best-effort coercion on writes, etc.

What we need is something like 
https://cloud.google.com/bigquery/docs/exporting-data#avro_export_details for 
the JSON format, and to make sure we are consistent between our treatment of 
the Avro and JSON format. It seems that when we read the Avro and then convert 
to TableRow, we end up with a different TableRow than we would if we did a JSON 
export and read it in, or read the row directly as a TableRow.

> BigQuery IO not support convert double type for beam row
> --------------------------------------------------------
>
>                 Key: BEAM-9613
>                 URL: https://issues.apache.org/jira/browse/BEAM-9613
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-gcp
>            Reporter: MAKSIM TSYGAN
>            Priority: Major
>
> If execute query with double column  via BigQueryIO.readFrom(), I get 
> exception:
> Caused by: java.lang.UnsupportedOperationException: Converting BigQuery type 
> 'class java.lang.Double' to 'FieldType\{typeName=DOUBLE, nullable=true, 
> logicalType=null, collectionElementType=null, mapKeyType=null, 
> mapValueType=null, rowSchema=null, metadata={}}' is not supported
>  at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils.toBeamValue(BigQueryUtils.java:532)
>  at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils.toBeamRowFieldValue(BigQueryUtils.java:483)
>  at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils.lambda$toBeamRow$6(BigQueryUtils.java:469)
>  at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>  at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
>  at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
>  at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>  at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
>  at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils.toBeamRow(BigQueryUtils.java:470)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (BEAM-9613) BigQuery IO not support convert double type for beam row

Reply via email to