chunyang commented on issue #11433: [BEAM-9769] Ensuring JSON is the default 
export format for BQ sink
URL: https://github.com/apache/beam/pull/11433#issuecomment-614377074
 
 
   Hmm interesting, I agree keeping JSON as default is probably the safer bet.
   
   We have seen a case internally where the data provided to WriteToBigQuery is 
a string-like date, e.g., `"2020-01-01"`. When writing with JSON intermediate 
format, the data shows up as a DATE column in BigQuery, but we can't get the 
same behavior with Avro format without doing one of:
   1. Specifying schema for that column as DATE and modifying the incoming 
PCollection to use `datetime.date` or
   2. Specifying schema for that column as STRING, in which case it no longer 
is a DATE column in BigQuery.
   
   The 2nd option is problematic when we're appending to an existing table, in 
which case we have to modify the pipeline to keep appending to it.
   
   fastavro 0.22.2 allows writing a string type to a column defined as date 
logical type (PRs fastavro/fastavro#338 and fastavro/fastavro#349), but seems 
like Beam pins the fastavro constraint to <0.22, so for not we can't take 
advantage of that.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to