juarezr opened a new pull request, #57514:
URL: https://github.com/apache/airflow/pull/57514

   <!--
    Licensed to the Apache Software Foundation (ASF) under one
    or more contributor license agreements.  See the NOTICE file
    distributed with this work for additional information
    regarding copyright ownership.  The ASF licenses this file
    to you under the Apache License, Version 2.0 (the
    "License"); you may not use this file except in compliance
    with the License.  You may obtain a copy of the License at
   
      http://www.apache.org/licenses/LICENSE-2.0
   
    Unless required by applicable law or agreed to in writing,
    software distributed under the License is distributed on an
    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    KIND, either express or implied.  See the License for the
    specific language governing permissions and limitations
    under the License.
    -->
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of an existing issue, reference it using one of the following:
   
   closes: #57461
   related: #29902
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   
   
   <!-- Please keep an empty line above the dashes. -->
   
   ---
   
   # Fix MSSQLToGCSOperator MSSQL BIT data type conversion to Parquet boolean 
closes #57461
   
   ## Summary
   
   Fixes `ArrowTypeError: Expected bytes, got a 'bool' object` when exporting 
MSSQL bit fields to Parquet format using `MSSQLToGCSOperator`.
   
   ### Issue
   
   - Issue: #57461
   - Problem: The `MSSQLToGCSOperator` incorrectly mapped MSSQL bit fields 
(type 2) to `"BOOLEAN"` in the `type_map`, but the base class 
`BaseSQLToGCSOperator._convert_parquet_schema()` expects `"BOOL"` for boolean 
types.
   
   ### Root Cause
   
   The `type_map` property in `MSSQLToGCSOperator` had an incorrect type 
mapping:
   
   - **Before**: `type_map = {2: "BOOLEAN", ...}`
   - **After**: `type_map = {2: "BOOL", ...}`
   
   This mismatch caused PyArrow schema conversion to fail when processing bit 
fields in Parquet format exports.
   
   ### Impact
   
   - **Affected Users**: Users exporting MSSQL bit fields to Parquet format 
using `MSSQLToGCSOperator`
   - **Breaking Changes**: None (this is a bug fix)
   - **Other Export Formats**: CSV and JSON formats are unaffected (they don't 
use this type mapping)
   
   ### Related Issues/PRs
   
   closes: #57461
   related: #29902 #11874
   
   ### Additional Notes
   
   Users can temporarily work around this issue by creating a custom operator 
that overrides the `type_map` property:
   
   ```python
   class FixedMSSQLToGCSOperator(MSSQLToGCSOperator):
       type_map = {2: "BOOL", 3: "INTEGER", 4: "TIMESTAMP", 5: "NUMERIC"}
   ```
   
   However, this fix makes the workaround unnecessary.
   
   ## Changes Made
   
   ### 1. Fixed Type Mapping (`mssql_to_gcs.py`)
   
   - Changed `type_map` from `{2: "BOOLEAN"}` to `{2: "BOOL"}` to match the 
expected type key in `BaseSQLToGCSOperator._convert_parquet_schema()`
   
   ### 2. Updated Tests (`test_mssql_to_gcs.py`)
   
   - Updated `SCHEMA_JSON` and `SCHEMA_JSON_BIT_FIELDS` constants to use 
`"BOOL"` instead of `"BOOLEAN"` to match the fix
   - Added new test `test_exec_success_parquet_with_bit_fields()` to verify 
that bit fields can be exported to Parquet format without errors
   
   ### Files Changed
   
   1. 
`providers/google/src/airflow/providers/google/cloud/transfers/mssql_to_gcs.py` 
(line 70)
      - Changed `type_map = {2: "BOOLEAN", ...}` to `type_map = {2: "BOOL", 
...}`
   
   2. `providers/google/tests/unit/google/cloud/transfers/test_mssql_to_gcs.py`
      - Updated schema constants to use `"BOOL"` instead of `"BOOLEAN"`
      - Added `test_exec_success_parquet_with_bit_fields()` test
   
   ## Testing
   
   The fix has been tested and verified:
   
   - ✅ Tested manually with a DAG exporting MSSQL bit fields to Parquet format
   - ✅ Unit tests updated to reflect the correct type mapping
   - ✅ New test case added to prevent regression
   
   Command to test the changes:
   
   ```sh
   $ breeze testing providers-tests --skip-db-tests 
providers/google/tests/unit/google/cloud/transfers/test_mssql_to_gcs.py
   ...
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to