chunyang edited a comment on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink URL: https://github.com/apache/beam/pull/11433#issuecomment-614377074 Hmm interesting, I agree keeping JSON as default is probably the safer bet. We have seen a case internally where the data provided to WriteToBigQuery is a string-like date, e.g., `"2020-01-01"`. When writing with JSON intermediate format, the data shows up as a DATE column in BigQuery, but we can't get the same behavior with Avro format without doing one of: 1. Specifying schema for that column as DATE and modifying the incoming PCollection to use `datetime.date` or 2. Specifying schema for that column as STRING, in which case it no longer is a DATE column in BigQuery. The 2nd option is problematic when we're appending to an existing table, in which case we have to modify the pipeline to keep appending to it. fastavro 0.22.2 allows writing a string type to a column defined as date logical type (PRs fastavro/fastavro#338 and fastavro/fastavro#349), but seems like Beam pins the fastavro constraint to <0.22, so for now we can't take advantage of that. I believe your comments in CHANGES are accurate, there are some date-like and datetime-like strings that will behave differently in Avro vs JSON format.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
