patricker opened a new issue, #26248:
URL: https://github.com/apache/airflow/issues/26248
### Apache Airflow Provider(s)
google
### Versions of Apache Airflow Providers
8.3.0
### Apache Airflow version
2.3.4
### Operating System
OSX
### Deployment
Virtualenv installation
### Deployment details
_No response_
### What happened
`PostgresToGCSOperator` was being used to query a table containing
timestamps and JSON data.
`export_format='parquet'` was set.
The export to Parquet fails.
For the Timestamp, the error is:
```
File
"/apache-airflow/airflow/providers/google/cloud/transfers/sql_to_gcs.py", line
154, in execute
for file_to_upload in self._write_local_data_files(cursor):
File
"/apache-airflow/airflow/providers/google/cloud/transfers/sql_to_gcs.py", line
241, in _write_local_data_files
tbl = pa.Table.from_pydict(row_pydic, parquet_schema)
File "pyarrow/table.pxi", line 1724, in pyarrow.lib.Table.from_pydict
File "pyarrow/table.pxi", line 2385, in pyarrow.lib._from_pydict
File "pyarrow/array.pxi", line 341, in pyarrow.lib.asarray
File "pyarrow/array.pxi", line 315, in pyarrow.lib.array
File "pyarrow/array.pxi", line 39, in pyarrow.lib._sequence_to_array
File "pyarrow/error.pxi", line 143, in
pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 122, in pyarrow.lib.check_status
pyarrow.lib.ArrowTypeError: object of type <class 'str'> cannot be converted
to int
```
For the JSON, the error is the same, except:
```
pyarrow.lib.ArrowTypeError: Expected bytes, got a 'dict' object
```
### What you think should happen instead
Both timestamp and JSON datatypes should be supported for export to Parquet
### How to reproduce
As long as the export format is set to parquet, then selecting any timestamp
or JSON object reproduces the issue.
### Anything else
While troubleshooting this, I believe the fix has to be added to every
subclass of `BaseSQLToGCSOperator`. In the `convert_type` method,
timestamps/dates/times are all converted to strings. This is good for CSV and
JSON, but is causing issues with Parquet. Instead the types should be returned
as-is.
As for JSON, probably just set `stringify_dict=True` for Parquet parsing is
my guess.
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]