deepujain opened a new pull request, #63206:
URL: https://github.com/apache/airflow/pull/63206

   ## Summary
   
   Fixes DagBag timeout when DAGs import `DataprocCreateBatchOperator` 
(#62373). Importing this operator previously pulled in the full 
`operators.dataproc` module and its heavy dependencies 
(`google.cloud.dataproc_v1`, `DataprocHook`, triggers), causing parse times of 
30+ seconds on small workers.
   
   ## Change
   
   - **Lazy-load `DataprocCreateBatchOperator`:** Turned `operators.dataproc` 
into a package. `DataprocCreateBatchOperator` is provided from a lightweight 
`._batch` submodule that defers `google.cloud.dataproc_v1`, `DataprocHook`, and 
related imports until `execute()` / `hook` / etc. All other operators remain in 
`._core` and are loaded on first access.
   - **`dataproc/__init__.py`:** Uses `__getattr__` to return 
`DataprocCreateBatchOperator` from `._batch` and other names from `._core`.
   - **`dataproc/_batch.py`:** Contains only `DataprocCreateBatchOperator` with 
local imports for heavy deps inside methods.
   - **`dataproc/_core.py`:** Previous `dataproc.py` content minus the Batch 
operator class.
   - **Tests:** `DATAPROC_PATH` now points at `._core` for non-Batch operators. 
`TestDataprocCreateBatchOperator` uses `DATAPROC_BATCH_HOOK_PATH` and 
`DATAPROC_BATCH_TO_DICT_PATH` so mocks apply where the Batch operator actually 
imports (hooks module and `google.cloud.dataproc_v1`).
   
   ## Why no new tests
   
   Existing `TestDataprocCreateBatchOperator` tests were updated to patch the 
correct modules and continue to cover the operator; no new test file added.
   
   Fixes #62373
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to