Re: Is Airflow’s Task SDK philosophically similar to Spark Connect's API–engine decoupling?

Jarek Potiuk Sun, 23 Nov 2025 08:54:56 -0800

And just to add - I would not draw too many analogies specifically for
Spark SDK <> Airflow except the underlying "general IT design principles" -
which are sound regardless of particular SDK implementation.

Generally the principles that we follow in Task SDK apply to literally
**any** SDK where you want to define a stable API for whoever is the user.
Also it follows the principles of basically **any** SDK where you utilise
client-server architecture and you want to make use of the
HTTP/Security/Proxy architecture and fine-grained permissions granted for
the client. Those are just "common practices" we implemented with Task SDK.
There was no particular inspiration or parallel to Spark SDK,

Moreover,  there are significant differences vs Spark on the "logic" level
of the APIS and I would definitely not compare the two, because people
might be misguided if you say "it's like Spark SDK". Spark SDK and Airflow
Task SDK are fundamentally different (on a logic level).

While the underlying technologies and principles (decoupling of the client
from server code, very clear boundary and "SDK" that exposes only what's
really possible to be done with the client) - there are fundamental
differences of what we do in Airflow.

The main stark difference is the direction of workload submission and
execution that are basically 100% reversed Spark vs. Airflow.

* In Spark the "server" basically executes the workloads submitted by the
client - because Spark Server is a workflow execution engine
* In Airflow the "server" is the one that submits workflows to be executed
by the client  - because Airflow Server is an orchestrator that tells their
workers what to do (and often those tasks delegate that tasks to other
servers like Spark, which means that the worker often acts as a client for
both -> orchestrating engine of Airflow, and execution engine (for example
Spark).

This is 100% reverse of control - even if the underlying low-level
principles (isolation, decoupling, HTTP communication, security) are
similar - but mostly because they just make sense in general engineering,
not because those two SDKs do things in a similar way. They don't.

J.

On Sun, Nov 23, 2025 at 10:57 AM Amogh Desai <[email protected]> wrote:

> Answering as one of the Airflow developers contributing to the task SDK.
>
> Q1: If Engine = Execution and API = Server side, the analogy is comparable.
> The goal of task SDK is to decouple
> Dag authoring from Airflow internals and providing a version agnostic
> stable interface for writing Dags.
>
> Q2: Yes, that's the intention. Custom executor's might require a some
> adaptation while adopting AF3 the first time
> because Airflow 3, deals in *workloads
> <
> https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/executor/index.html#workloads
> >
> *vs CLI commands in < 3.0
>
> Q3: You can compare / draw relations provided that the comparison is in
> context to the server-client separation and future
> proofing consumers from internal changes.
>
> Thanks & Regards,
> Amogh Desai
>
>
> On Sat, Nov 22, 2025 at 10:10 AM Kyungjun Lee <[email protected]>
> wrote:
>
> > Hi Airflow developers,
> >
> > I’ve been studying the Airflow *Task SDK* in detail, and I find its
> > direction very interesting—especially the idea of introducing a stable,
> > user-facing API layer that is decoupled from the internal executor,
> > scheduler, and runtime behavior.
> >
> > While going through the design notes and recent changes around the Task
> > SDK, it reminded me of the architectural philosophy behind *Apache Spark
> > Connect*, which also emphasizes:
> >
> >    -
> >
> >    separating user-facing APIs from the underlying execution engine
> >    -
> >
> >    providing a stable long-term public API surface
> >    -
> >
> >    enabling flexible execution models
> >    -
> >
> >    reducing coupling between API definitions and the actual runtime
> >    environment
> >
> > This made me wonder whether the philosophical direction is similar or if
> I
> > am drawing an incorrect analogy.
> > I would like to ask a few questions to better understand Airflow’s
> > long-term intent:
> > ------------------------------
> > *Q1.*
> >
> > Is the Task SDK intentionally aiming for a form of *API–engine
> decoupling*
> > similar to Spark Connect?
> > Or is the motivation fundamentally different?
> > *Q2.*
> >
> > Is the long-term vision that tasks will be defined through a stable Task
> > SDK interface while the underlying scheduler/executor implementations
> > evolve independently without breaking user code?
> > *Q3.*
> >
> > *https://issues.apache.org/jira/browse/SPARK-39375
> > <https://issues.apache.org/jira/browse/SPARK-39375>  # spark-connect*
> >
> > From the perspective of the Airflow dev community, does it make sense to
> > compare Task SDK ↔ Spark Connect, or is the architectural direction of
> > Airflow fundamentally different?
> > ------------------------------
> >
> > I’m asking these questions because I want to *better understand the
> > philosophy that Airflow is trying to pursue*, and confirm whether my
> > interpretation of the Task SDK direction is accurate.
> >
> > Any insights or clarifications would be greatly appreciated.
> > Thank you for your continued work on Airflow.
> >
> > Best regards,
> > *Kyungjun Lee*
> >
>

Re: Is Airflow’s Task SDK philosophically similar to Spark Connect's API–engine decoupling?

Reply via email to