Fahad-Alam-Jamal opened a new pull request, #59151:
URL: https://github.com/apache/airflow/pull/59151

   <!--
    Licensed to the Apache Software Foundation (ASF) under one
    or more contributor license agreements.  See the NOTICE file
    distributed with this work for additional information
    regarding copyright ownership.  The ASF licenses this file
    to you under the Apache License, Version 2.0 (the
    "License"); you may not use this file except in compliance
    with the License.  You may obtain a copy of the License at
   
      http://www.apache.org/licenses/LICENSE-2.0
   
    Unless required by applicable law or agreed to in writing,
    software distributed under the License is distributed on an
    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    KIND, either express or implied.  See the License for the
    specific language governing permissions and limitations
    under the License.
    -->
   
   ## Fix DatabricksSqlOperator XCom pickle serialization
   
   closes: #59103
   
   ### Description
   This PR fixes the issue where `DatabricksSqlOperator` fails with 
`_pickle.PicklingError: Can't pickle <class 
'airflow.providers.databricks.hooks.databricks_sql.Row'>` when XCom push is 
enabled (`do_xcom_push=True`).
   
   ### Root Cause
   The Databricks SQL connector returns `databricks.sql.types.Row` objects, 
which are dynamically created classes that cannot be pickled. XCom requires all 
return values to be picklable for storage in the Airflow metadata database. 
When using the default `fetch_all_handler`, these unpicklable Row objects were 
returned directly without conversion.
   
   ### Solution
   Introduced a new `PicklableRow` wrapper class in `DatabricksSqlHook` that:
   - Wraps unpicklable Row objects and makes them picklable via a custom 
`__reduce__` method
   - Maintains full backward compatibility by delegating to an internal 
namedtuple
   - Supports all namedtuple interface operations: `_fields`, `_asdict()`, 
iteration, and attribute access
   - Properly handles field name renaming for invalid Python identifiers (e.g., 
`count(1)` → `_0`)
   
   ### Changes
   - **Hook**: Modified `DatabricksSqlHook.run()` to always convert Row objects 
to PicklableRow, even when no handler is provided
   - **Hook**: Updated `_make_common_data_structure()` to use PicklableRow 
instead of dynamic namedtuples
   - **Tests**: Added `test_xcom_pickle_results_with_row_objects()` to verify 
pickle serialization works correctly
   - **Backward Compatibility**: All 35 existing unit tests pass, confirming no 
breaking changes
   
   ### Testing
   - ✅ All 35 unit tests pass, including the new pickle test
   - ✅ Verified pickle.dumps() and pickle.loads() work correctly on converted 
Row objects
   - ✅ Confirmed `_fields` attribute returns properly renamed field names
   - ✅ Verified `_asdict()` method returns dictionaries with original field 
names
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to