GitHub user Rishabh1627rawat created a discussion: Using 
DatabricksSubmitRunOperator inside @task — is pool applied correctly

Here’s your message rewritten in **clean, simple, and well-structured 
language**, ready to post:

---

Hi everyone,

I’m using Airflow 2.x with the `@task` decorator (TaskFlow API), and I’m trying 
to better understand how Airflow handles execution and pools when an operator 
is called inside a Python task.

Right now, I’m using this pattern:

```python
@task(pool="databricks_superset_med", retries=3)
def run_databricks(run_payload, **context):
    op = DatabricksSubmitRunOperator(
        task_id="data_transformation",
        databricks_conn_id="databricks",
        json=run_payload,
        wait_for_termination=True,
    )
    return op.execute(context=context)
```

In this setup:

* The `@task` has a pool assigned.
* Inside that function, I create a `DatabricksSubmitRunOperator`.
* I manually call `op.execute(context=context)`.

This successfully triggers my Databricks notebook, and because 
`wait_for_termination=True`, the task waits until the notebook run finishes.

However, I want to better understand what is happening internally.

Specifically:

* Is the pool applied only to the outer Python `@task`?
* From the scheduler’s perspective, is the `DatabricksSubmitRunOperator` 
treated as a separate task?
* Or is it completely invisible because it is executed manually inside the 
Python task?
* Why does the pool on the inner operator not take effect?
* Would it be better practice to define `DatabricksSubmitRunOperator` directly 
in the DAG instead of wrapping it inside a `@task`?

I understand that manually calling `.execute()` may bypass some of Airflow’s 
orchestration mechanisms. I’m especially curious how this pattern affects:

* Pool slot acquisition
* Task lifecycle tracking
* Scheduler awareness
* UI visibility

This is not a functional issue — everything runs successfully. I just want to 
understand the internal behavior better and ensure I’m following best practices 
without unintentionally bypassing Airflow’s concurrency controls.

I would appreciate any clarification on how the scheduler treats this pattern 
internally.

GitHub link: https://github.com/apache/airflow/discussions/62403

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to