dclandau opened a new issue, #25474: URL: https://github.com/apache/airflow/issues/25474
### Apache Airflow Provider(s) google ### Versions of Apache Airflow Providers apache-airflow-providers-google==6.8.0 ### Apache Airflow version 2.3.2 ### Operating System Debian GNU/Linux 11 (bullseye) ### Deployment Docker-Compose ### Deployment details _No response_ ### What happened When converting postgres native data type to bigquery data types, [this](https://github.com/apache/airflow/blob/main/airflow/providers/google/cloud/transfers/sql_to_gcs.py#L288) function is responsible for converting from postgres types -> bigquery types -> parquet types. The [map](https://github.com/apache/airflow/blob/main/airflow/providers/google/cloud/transfers/postgres_to_gcs.py#L80) in the PostgresToGCSOperator indicates that the postgres boolean type matches to the bigquery `BOOLEAN` data type. Then when converting from bigquery to parquet data types [here](https://github.com/apache/airflow/blob/main/airflow/providers/google/cloud/transfers/sql_to_gcs.py#L288), the [map](https://github.com/apache/airflow/blob/main/airflow/providers/google/cloud/transfers/sql_to_gcs.py#L289) does not have the `BOOLEAN` data type in its keys. Because the type defaults to string in the following [line](https://github.com/apache/airflow/blob/main/airflow/providers/google/cloud/transfers/sql_to_gcs.py#L305), the BOOLEAN data type is converted into string, which then fails when converting the data into `pa.bool_()`. When converting the boolean data type into `pa.string()` pyarrow raises an error. ### What you think should happen instead I would expect the postgres boolean type to map to `pa.bool_()` data type. Changing the [map](https://github.com/apache/airflow/blob/main/airflow/providers/google/cloud/transfers/postgres_to_gcs.py#L80) to include the `BOOL` key instead of `BOOLEAN` would correctly map the postgres type to the final parquet type. ### How to reproduce 1. Create a postgres connection on airflow with id `postgres_test_conn`. 2. Create a gcp connection on airflow with id `gcp_test_conn`. 3. In the database referenced by the `postgres_test_conn`, in the public schema create a table `test_table` that includes a boolean data type, and insert data into the table. 4. Create a bucket named `issue_PostgresToGCSOperator_bucket`, in the gcp account referenced by the `gcp_test_conn`. 5. Run the dag below that inserts the data from the postgres table into the cloud storage bucket. ```python import pendulum from airflow import DAG from airflow.providers.google.cloud.transfers.postgres_to_gcs import PostgresToGCSOperator with DAG( dag_id="issue_PostgresToGCSOperator", start_date=pendulum.parse("2022-01-01"), )as dag: task = PostgresToGCSOperator( task_id='extract_task', filename='uploading-{}.parquet', bucket="issue_PostgresToGCSOperator_bucket", export_format='parquet', sql="SELECT * FROM test_table", postgres_conn_id='postgres_test_conn', gcp_conn_id='gcp_test_conn', ) ``` ### Anything else _No response_ ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
