chunyang edited a comment on issue #11433: [BEAM-9769] Ensuring JSON is the 
default export format for BQ sink
URL: https://github.com/apache/beam/pull/11433#issuecomment-614377074
 
 
   Hmm interesting, I agree keeping JSON as default is probably the safer bet.
   
   We have seen a case internally where the data provided to WriteToBigQuery is 
a string-like date, e.g., `"2020-01-01"`. When writing with JSON intermediate 
format, the data shows up as a DATE column in BigQuery, but we can't get the 
same behavior with Avro format without doing one of:
   1. Specifying schema for that column as DATE and modifying the incoming 
PCollection to use `datetime.date` or
   2. Specifying schema for that column as STRING, in which case it no longer 
is a DATE column in BigQuery.
   
   The 2nd option is problematic when we're appending to an existing table, in 
which case we have to modify the pipeline to keep appending to it.
   
   fastavro 0.22.2 allows writing a string type to a column defined as date 
logical type (PRs fastavro/fastavro#338 and fastavro/fastavro#349), but seems 
like Beam pins the fastavro constraint to <0.22, so for now we can't take 
advantage of that.
   
   I believe your comments in CHANGES are accurate, there are some  date-like 
and datetime-like strings that will behave differently in Avro vs JSON format.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to