chunyang commented on issue #11433: [BEAM-9769] Ensuring JSON is the default export format for BQ sink URL: https://github.com/apache/beam/pull/11433#issuecomment-614377074 Hmm interesting, I agree keeping JSON as default is probably the safer bet. We have seen a case internally where the data provided to WriteToBigQuery is a string-like date, e.g., `"2020-01-01"`. When writing with JSON intermediate format, the data shows up as a DATE column in BigQuery, but we can't get the same behavior with Avro format without doing one of: 1. Specifying schema for that column as DATE and modifying the incoming PCollection to use `datetime.date` or 2. Specifying schema for that column as STRING, in which case it no longer is a DATE column in BigQuery. The 2nd option is problematic when we're appending to an existing table, in which case we have to modify the pipeline to keep appending to it. fastavro 0.22.2 allows writing a string type to a column defined as date logical type (PRs fastavro/fastavro#338 and fastavro/fastavro#349), but seems like Beam pins the fastavro constraint to <0.22, so for not we can't take advantage of that.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
