sanxore commented on PR #25691: URL: https://github.com/apache/airflow/pull/25691#issuecomment-1220290640
> > * This Operator have no interaction with BigQuery. The goal of this Operator is to write data to GCS file system with the right type if we write data to a binary format ( like parquet) or with the right format if we write data to a string format ( like csv ) > > The schema generated by this operator use type that are safe for BigQuery (`_write_local_schema_file` will for instance use `field_to_bigquery`, using the underlying `type_map` mapping db types to BigQuery types). This makes sure we can load into `BigQuery` all data exported with a `BaseSQLToGCSOperator`. > > Parquet has on top of that an additional mapping, for mapping BigQuery types to pyarrow types. (See `_convert_parquet_schema`). > > I would expect parquet export to be successful when columns are dates, but also be able to import this into bigquery with a correct schema definition. (This is how it works for `csv` and `json` export format) > > _Note: I took a quick look, and couldn’t find what changed. On my bucket I found working extract from 19 April 2022, with `PostgresToGCSOperator` to parquet format with `date` and `datetime` in the schema. 🤔_ But unfortunately we don't have unit tests on parquet format which can confirm that parquet format was working :( -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
