Funny you should say "step", I have an AIP coming along on that front, so
please don't name it step -- specificall about "Task Steps". I should have
it out soon.

On Thu, 4 Jun 2026 at 23:03, Jarek Potiuk <[email protected]> wrote:

> Hi David and everyone,
>
> I agree with Ash; the "Dynamic" part of the current naming is quite
> misleading. I’ve been giving this some thought, and I believe we might be
> looking for the wrong thing to name (batch) because we are looking at it
> from the wrong angle.
>
> (And yes a lot of this comes after some chatting with Claude about my gut
> feelings).
>
> To address concerns about communicating resilience and durability, we
> should consider this from the perspective of the "durability unit."
> Currently, Airflow has three such units:
>
> 1.  Task: Regular tasks.
> 2.  Mapped Task: Individual elements in a task array.
> 3.  Deferred Task: Sub-task parts stored as state in the Triggerer DB.
>
> In all three cases, if the infrastructure fails, the entire unit is lost
> and must restart from the beginning. David's new construct idea doesn't
> change these execution-layer properties. Whether it’s a single task or a
> mapped task, the "unit of durability" remains the same, even if the "unit
> of work" within it changes.
>
> Therefore, we should use a different name for that "construct"—one that
> clearly indicates these sequential actions are not tasks and do not carry
> task-level durability guarantees. After exploring some options, I think
> "step" is the most effective term. It has strong precedent in CI systems
> (GitHub Actions, GitLab) and AWS Step Functions, where a job/task is the
> durable unit and steps are subordinate, sequential, and ephemeral. Other
> alternatives like "action," "leg," or "segment" are possible, but "step"
> offers instant comprehension.
>
> Using "step" would also allow for a clean syntax that distinguishes it from
> the @task decorator. For example:
>
> @step(retries=2)
> def load(item):
>     warehouse.write(item)
>
> @task
> def ingest_orders(data):
>     load.map(data)
>
> Or even do things asynchronously - not sequentially
>
> @step(retries=2)
> async def load(item):
>     await warehouse.write(item)
>
> @task
> async def ingest_orders(data):
>     load.map(data) # map normally is synchronous but we could make it work
> with async steps
>
> This approach allows for invisible retries within a task and clearly
> communicates that if a step fails, the task is the recovery point. It also
> clarifies the "chunking" concept we were discussing (BTW. "chunk" is
> probably what were looking for - but in this concept we don't have to name
> it as a concept in Airflow):
>
> @step(retries=2)
> async def load(item):
>     await warehouse.write(item)
>
> @task
> async def process_chunk(chunk):
>     load.map(chunk)
>
> # Note! This is just an example of how we can add chunking - we could make
> it better without a separate task, materialization and with some other
> syntactic sugar
> @task
> def make_chunks(items, size=100):
>     return [items[i:i+size] for i in range(0, len(items), size)
>
> process_chunk.expand(chunk=make_chunks(get_items()))
>
> By introducing "step" as a distinct unit, we can more accurately
> communicate that it lacks the independent durability of a task while
> providing users with a familiar mental model.
>
> What do you think?
>
> Best regards,
> Jarek
>

Reply via email to