[
https://issues.apache.org/jira/browse/BEAM-11047?focusedWorklogId=498606&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-498606
]
ASF GitHub Bot logged work on BEAM-11047:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 09/Oct/20 15:06
Start Date: 09/Oct/20 15:06
Worklog Time Spent: 10m
Work Description: kmjung commented on pull request #13058:
URL: https://github.com/apache/beam/pull/13058#issuecomment-706236204
The purpose of the convertGenericRecordToTableRow class is to provide
compatibility with BigQuery's legacy JSON TableRow type. I can't find any
documentation here, but my understanding is that -- regrettably -- INTEGER
values are sent as strings in the original JSON, and so this is done here for
compatibility reasons. I don't think we can change this at this point, anyways,
as existing pipelines rely on this behavior.
If you're looking to optimize your storage API pipelines, I would encourage
you to look at consuming the GenericRecords produced by the stream source
directly, rather than converting them to TableRow.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 498606)
Time Spent: 40m (was: 0.5h)
> BigQuery IO: Avro INTEGER values get converted to String objects
> ----------------------------------------------------------------
>
> Key: BEAM-11047
> URL: https://issues.apache.org/jira/browse/BEAM-11047
> Project: Beam
> Issue Type: Bug
> Components: io-java-gcp
> Affects Versions: 2.22.0, 2.23.0, 2.24.0
> Reporter: Jonas Grabber
> Priority: P2
> Labels: bigquery, java
> Time Spent: 40m
> Remaining Estimate: 0h
>
> For some reason, convertRequiredField used in
> BigQueryAvroUtils.convertGenericRecordToTableRow [casts values with the Avro
> INTEGER type to Long, but then converts them to String objects via
> toString|https://github.com/apache/beam/blob/v2.23.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryAvroUtils.java#L326].
> I'm quite unsure where convertGenericRecordToTableRow is used elsewhere, but
> we use it to utilize BigQuery's Storage API reads.
> I'm fairly certain this is not expected behaviour because other types are
> converted properly and due to the cast to Long it's ensured that any values
> will fit into Long objects anyways.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)