Hello everyone,

The plenty of integrations with external services a.k.a operators is
one of the bigest advantages of Airflow. As documentation states:
"An operator represents a single, ideally idempotent, task. "

The idempotence - I think - is the key to create a usable operator. It
assures that we can run backfills and use fewer resources. The problem
is that there's no official Airflow definition of idempotence. Or at
least I'm not aware of any.

What do I mean by "Airflow definition"? By this, I mean a guide or
recipe for making an operator idempotent including the limits of
real-world idempotency.

The reason for bringing this topic are those two PRs:
- https://github.com/apache/airflow/pull/9593 which improves creating
Dataproc cluster (create, if exists check state, if wrong then delete
and wait and then create new one)
-https://github.com/apache/airflow/pull/9590 improving BigQuery insert
job idempotency (submit, if job_id exists check state, if running/ok
reattach, if failed then generate new job_id, submit)

Both PRs implements suggestions from our users and solve real,
production-grade problems. Both do this in a non-perfect way because
each of those operators tries to tackle with variety of idempotence
problems. This requires some custom logic that has to work with
non-deterministic situations (i.e. Dataproc and unknown time of
deleting cluster). And that makes me wonder what is the exact
definition of "single, ideally idempotent, task"?

Operators should answer users' needs - there's no question to that.
But it is the community that will have to maintain the operators. And
maintinaing complex logic which is hard (or nearly impossible) to test
in e2e way is not a pleasent task.

What I would like to ask you is:
- what does it mean for you that the operator is idempotent?
- what does it mean "single task"? Does it mean a single event or
operation (set of events)?

By doing this I would like to work on a set of how-to rules for
designing the logic of `execute` method. I would like to encourage you
to share your experiences with desiging and working with complex
operators :)

Hope you are good,
Tomek

Reply via email to