[
https://issues.apache.org/jira/browse/BEAM-9769?focusedWorklogId=423204&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-423204
]
ASF GitHub Bot logged work on BEAM-9769:
----------------------------------------
Author: ASF GitHub Bot
Created on: 16/Apr/20 02:30
Start Date: 16/Apr/20 02:30
Worklog Time Spent: 10m
Work Description: chunyang commented on issue #11433: [BEAM-9769]
Ensuring JSON is the default export format for BQ sink
URL: https://github.com/apache/beam/pull/11433#issuecomment-614377074
Hmm interesting, I agree keeping JSON as default is probably the safer bet.
We have seen a case internally where the data provided to WriteToBigQuery is
a string-like date, e.g., `"2020-01-01"`. When writing with JSON intermediate
format, the data shows up as a DATE column in BigQuery, but we can't get the
same behavior with Avro format without doing one of:
1. Specifying schema for that column as DATE and modifying the incoming
PCollection to use `datetime.date` or
2. Specifying schema for that column as STRING, in which case it no longer
is a DATE column in BigQuery.
The 2nd option is problematic when we're appending to an existing table, in
which case we have to modify the pipeline to keep appending to it.
fastavro 0.22.2 allows writing a string type to a column defined as date
logical type (PRs fastavro/fastavro#338 and fastavro/fastavro#349), but seems
like Beam pins the fastavro constraint to <0.22, so for now we can't take
advantage of that.
I believe your comments in CHANGES are accurate, there are some date-like
and datetime-like strings that will behave differently in Avro vs JSON format.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 423204)
Time Spent: 3h (was: 2h 50m)
> Ensure JSON imports are the default behavior for BigQuerySink and
> WriteToBigQuery in Python
> -------------------------------------------------------------------------------------------
>
> Key: BEAM-9769
> URL: https://issues.apache.org/jira/browse/BEAM-9769
> Project: Beam
> Issue Type: Bug
> Components: io-py-gcp
> Reporter: Pablo Estrada
> Assignee: Pablo Estrada
> Priority: Major
> Fix For: 2.21.0
>
> Time Spent: 3h
> Remaining Estimate: 0h
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)