techcodie commented on PR #63775: URL: https://github.com/apache/airflow/pull/63775#issuecomment-4132032475
Hello everyone, I've finalized the PR and it's now ready for review. I've performed a comprehensive cleanup and addressed all previous feedback: ### Final Summary of Work 1. **Avoided `AsyncToSync` error**: Refactored [BaseDatabricksHook](cci:2://file:///Users/anshbaheti/airflow/airflow/providers/databricks/src/airflow/providers/databricks/hooks/databricks_base.py:100:0-1304:20) and [DatabricksHook](cci:2://file:///Users/anshbaheti/airflow/airflow/providers/databricks/src/airflow/providers/databricks/hooks/databricks.py:267:0-934:24) to resolve connections asynchronously upfront. 2. **Cache Pre-warming**: Implemented cache pre-warming in [__aenter__](cci:1://file:///Users/anshbaheti/airflow/airflow/providers/databricks/src/airflow/providers/databricks/hooks/databricks_base.py:237:4-240:19) for the `@cached_property` [databricks_conn](cci:1://file:///Users/anshbaheti/airflow/airflow/providers/databricks/src/airflow/providers/databricks/hooks/databricks_base.py:172:4-174:89). This ensures subsequent sync attribute lookups (like `self.host`) hit the cached object directly, preventing the blocking [get_connection()](cci:1://file:///Users/anshbaheti/airflow/airflow/providers/databricks/src/airflow/providers/databricks/hooks/databricks_base.py:272:4-277:19) call in the triggerer process. 3. **Mirror Async Path**: Added async versions for all connection-dependent methods ([a_get_run](cci:1://file:///Users/anshbaheti/airflow/airflow/providers/databricks/src/airflow/providers/databricks/hooks/databricks.py:573:4-582:23), [a_check_azure_metadata_service](cci:1://file:///Users/anshbaheti/airflow/airflow/providers/databricks/src/airflow/providers/databricks/hooks/databricks_base.py:999:4-1021:77), [a_get_aad_token](cci:1://file:///Users/anshbaheti/airflow/airflow/providers/databricks/src/airflow/providers/databricks/hooks/databricks_base.py:411:4-457:34), etc.) and updated the Databricks triggers to use them. 4. **Adoption of Public APIs**: Switched to `get_async_connection()` from `airflow.providers.common.compat` for robust cross-version support, avoiding private SDK internal APIs. 5. **Clean History**: Squashed development commits into a single, clean commit focused solely on the Databricks provider. Removed all unrelated changes to other providers/newsfragments. 6. **Fixes**: Included incidental fixes (async for retries, `aiohttp.ClientResponseError` handling, SQL trigger timeout cancellation). ### Validation Results The fix has been verified against a real Databricks workspace using `DatabricksSubmitRunOperator` with `deferrable=True`. - **Triggerer Lifecycle**: Submited job -> DEFERRED -> Polled (PENDING -> RUNNING -> TERMINATED) -> fired completion -> SUCCESS. - **No AsyncToSync error**: The triggerer picked up the deferred task and completed it successfully without any IPC or sync-in-async errors. - **Unit Tests**: Verified with `pytest` and `mypy` for the [databricks](cci:1://file:///Users/anshbaheti/airflow/airflow/providers/databricks/src/airflow/providers/databricks/hooks/databricks_base.py:172:4-174:89) provider (all 107 local tests pass). The PR is now in its final, mergeable state. Looking forward to your thoughts! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
