juarezr commented on code in PR #57514:
URL: https://github.com/apache/airflow/pull/57514#discussion_r2483111520
##########
providers/google/src/airflow/providers/google/cloud/transfers/mssql_to_gcs.py:
##########
@@ -67,7 +67,7 @@ class MSSQLToGCSOperator(BaseSQLToGCSOperator):
ui_color = "#e0a98c"
- type_map = {2: "BOOLEAN", 3: "INTEGER", 4: "TIMESTAMP", 5: "NUMERIC"}
+ type_map = {2: "BOOL", 3: "INTEGER", 4: "TIMESTAMP", 5: "NUMERIC"}
Review Comment:
Hi @MaksYermak ,
- Currently, the only way you can export columns from an MSSQL table with a
datatype BIT to a BOOLEAN column in a parquet file is by specifying the column
names in the `bit_fields` property. Otherwise, it will be exported with the INT
datatype.
- However, if one does this, the data transfer will fail with the exception
described in the summary above.
- This change is meant to fix this bug. It is caused by a mismatch that
happens between this definition and the place where it is used to determine the
column type.
To allow us to see clearly the mismatch, I'm quoting below the text from the
issue #57461 that shows both code locations:
----
The issue is in the `MSSQLToGCSOperator` class at line 70:
```python
# Current (incorrect) implementation:
class MSSQLToGCSOperator(BaseSQLToGCSOperator):
type_map = {2: "BOOLEAN", 3: "INTEGER", 4: "TIMESTAMP", 5: "NUMERIC"}
# ^^^^^^^^
# This should be "BOOL"
```
The `BaseSQLToGCSOperator._convert_parquet_schema()` method expects `"BOOL"`
for boolean types, but `MSSQLToGCSOperator` maps bit fields (type 2) to
`"BOOLEAN"`. This mismatch causes PyArrow to fail when converting the schema:
```python
# Type map in base class BaseSQLToGCSOperator:
def _convert_parquet_schema(self, cursor):
type_map = {
"INTEGER": pa.int64(),
"FLOAT": pa.float64(),
"NUMERIC": pa.float64(),
"BIGNUMERIC": pa.float64(),
"BOOL": pa.bool_(), ## This should be the correct key instead
of BOOLEAN
"STRING": pa.string(),
"BYTES": pa.binary(),
"DATE": pa.date32(),
"DATETIME": pa.date64(),
"TIMESTAMP": pa.timestamp("s"),
}
```
----
I haven't tried to export to JSON and CSV formats to check if they fail, but:
- Exporting BIT fields to parquet only works with this fix
- I have run the existing test for this operator, and all of them passed.
I hope that I have answered your question properly.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]