We should be careful not to treat every line in the docs as "constitution" -- i.e. as a commandment.
And in the docs, I think we would be better off if we more clearly distinguished (1) the description of what *is* from (2) opinion about what *should be.* *This line should be chopped* Case in point is the line that animates this thread: *"An operator represents a single, ideally idempotent, task."* (from here <https://airflow.readthedocs.io/en/stable/howto/operator/index.html>) It's just a guess, but I suspect that this line was not meant (or voted on) as a binding rule for the airflow project, but merely meant to serve as a one-sentence answer to the question "what is an operator" on what is essentially a table of contents page. I think we should actually remove the line. Normative bits should be in a normative context, and this kind of content makes more sense in an "operator design patterns" page, or a "best practices" section, where the merits of different patterns and reasoning can be presented. And to the extent it is meant as a guideline or a rule -- it's too vague to be useful. So I'd propose we chop it and just leave the second line: See the Operators Concepts > <https://airflow.readthedocs.io/en/stable/concepts.html#concepts-operators> > documentation > and the Operators API Reference > <https://airflow.readthedocs.io/en/stable/_api/index.html> for more > information. *Is idempotence ideal?* Incidentally, even in the context of a "best practices page" I'd argue against the claim that *"idempotence is ideal."* First of all it needs clarification about what it actually means. But suppose that we accept that it means the canonical execution date pattern. Some pipelines and tasks lend themselves to this pattern; some do not. And while it is a good pattern where it works, it's not the only valid design pattern, it isn't the best solution for every data problem, and therefore it doesn't make sense to refer to it as "the ideal pattern". The execution_date-based idempotence pattern has special importance to airflow but I think that in reality the average cluster will have a variety of design patterns -- not all of them using the execution_date idempotence pattern. And I think we should reflect that reality in our docs and decision-making. *What is a single task?* Notwithstanding the above, regarding Tomek's question about the meaning of "single task", I think in effect what is meant here is just "*discrete* task" or "unit of work" -- a unit of work that can be picked up and executed on a worker. I don't take it as a claim about *recommended* operator scope -- if that's what it is meant to be, it should probably be made explicitly and in an appropriate context. *What an operator is, vs what is mergable* On another note, I think it also may be helpful to separate the question "what is an operator" from "what kinds of operators belong in airflow". Indeed this is another area of ambiguity in the quote above -- is it a claim about a best practice for users, as they implement operators for their organization? Or is it a claim about guidelines when considering whether to merge a new operator into airflow? >From the perspective "what is an operator", it is clear to me that an operator is (1) not necessarily idempotent, and (2) has arbitrary scope (i.e. re Tomek's 'what is single task' question). - idempotence is in general undefined because it depends entirely on how the user defines the task. (e.g. look at any SqlOperator) - scope is clearly arbitrary because `execute` can be implemented arbitrarily. Concerning "what kinds of operators belong in airflow"... I think it's clear that idempotence is not a requirement (because it's not in general a thing that is determinable based on operator design alone, but depends on usage). But, are there principles or guidelines that we should try to adhere to, or evaluate against? There very well might be. Or, should we try to maintain *compatibility* with a certain notion of idempotence, even if we don't have a well-defined idempotence criteria? Maybe so
