potiuk commented on code in PR #38100:
URL: https://github.com/apache/airflow/pull/38100#discussion_r1522762382
##########
docs/apache-airflow/authoring-and-scheduling/deferring.rst:
##########
@@ -257,3 +257,24 @@ In Airflow, sensors wait for specific conditions to be met
before proceeding wit
| Built-in functionality for rescheduling | Requires custom
logic to defer task and handle |
| | external changes
|
+--------------------------------------------------------+--------------------------------------------------------+
+
+Difference between ``up_for_retry`` and ``deferred`` state
+-------------------------------------------------------------------
+
+In Airflow, operators that's in `up_for_retry` state will still take worker
slots, becasue the process still runs and does ``sleep`` there. Deferral
Operators
+
+In Airflow when a operator is in the ``up_for_retry`` state, it essentially
means the operator is waiting to be retried after a failure, but it does not
release its resources. The process remains alive, keeping its memory, sockets,
and other resources allocated, except for the CPU. The ``deferred`` state,
utilized only by Deferrable Operators, offers a more sophisticated approach to
handling wait conditions. Deferrable Operators serialize and store the task's
state, freeing all resources. When a condition is met, the task is deserialized
and resumes operation, optimizing resource use by not holding onto resources
during wait periods.
+
++--------------------------------------------------------+--------------------------------------------------------+
+| state='up_for_retry' |
state='deferred' |
++========================================================+========================================================+
+| Keeps resources while waiting. | Releases
resources, pauses execution when idle, |
Review Comment:
Yes. I think @CongyueZhang - our earlier discussion could a little
misleading (and I realized I was talking about a different retry than you were)
so it's great that you proposed the documentation here.
There are several things that you can call "retry" in Airflow:
1) Retry done by waiting and retrying by the task itself in a loop - where
the task simply performs retry of a certain operation and `sleeps` - usually
done by tenacity or other mechanisms like that -> this is what I referred to
when it comes to retrying something that is in progress.
2) Retry that results it task "failing" and having retry count
(`up_for_retry`)
Here @dirrao explained - the task effectively fails, the slot is freed and
resources are freed as well. But In this case the state of the task in progrees
is not saved. Whatever the task had done so far is lost and when retrying the
whole task needs to be restarted from the beginning. Waiting can only be done
on a "time" base - and the task will restart from the beginning when retry time
passes - and will redo the job from the beginning. While taks is in
`up_for_retry` state - indeed resources are not used, but also when you retry
the task, you need to re-do what was done the first time you attempted to do
the previous time, because we do not keep the state of the originally failed
task. This MIGHT lead to increased resource usage because every time the task
attempts to re-run will have to effectively do the same "preparation" (whatever
the preparation is).
3) Deferring is a mechanism where you can defer the task and serialize it's
state to a disk and let Triggerer do the conditional wait. Which means that
effectively your task remains in `half-done` state and the state of doing it is
preserved and you can efffectively resume where you left off - because when the
condition is met (this might be time-based or possibly waiting for external job
completion or other async-io compatible conditoin) the state of the task is
restored from the disk and it resumes with the state restored to what it was
before it deferred.
So in your progression, roughly speaking:
1) takes resources while waiting but initialization is done only once
2) retrying causes excessive resource use for redoing initialization part of
the task (for tasks that need reinitialization) but waiting does not keep
resources
3) the state is preserved between deferrals and intitlalization is done only
once - so it roughly combines the benefits of 1) and 2) - with an extra
overhead of triggerer that does waiting for potentially 1000s of such deferred
tasks.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]