sanxore commented on PR #25691:
URL: https://github.com/apache/airflow/pull/25691#issuecomment-1220298819

   > > * This Operator have no interaction with BigQuery. The goal of this 
Operator is to write data to GCS file system with the right type if we write 
data to a binary format ( like parquet) or with the right format if we write 
data to a string format ( like csv )
   > 
   > The schema generated by this operator use type that are safe for BigQuery 
(`_write_local_schema_file` will for instance use `field_to_bigquery`, using 
the underlying `type_map` mapping db types to BigQuery types). This makes sure 
we can load into `BigQuery` all data exported with a `BaseSQLToGCSOperator`.
   > 
   > Parquet has on top of that an additional mapping, for mapping BigQuery 
types to pyarrow types. (See `_convert_parquet_schema`).
   > 
   > I would expect parquet export to be successful when columns are dates, but 
also be able to import this into bigquery with a correct schema definition. 
(This is how it works for `csv` and `json` export format)
   > 
   > _Note: I took a quick look, and couldn’t find what changed. On my bucket I 
found working extract from 19 April 2022, with `PostgresToGCSOperator` to 
parquet format with `date` and `datetime` in the schema. 🤔_
   
   We should not compare how json and csv format work with parquet format. 
Because it's two different different files types, json and csv file are string 
serialized format and parquet is a binary format which make parquet a type 
aware format. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to