dstandish edited a comment on issue #6210: [AIRFLOW-5567] 
BaseReschedulePokeOperator
URL: https://github.com/apache/airflow/pull/6210#issuecomment-558030520
 
 
   > Having a state table will have a fundamental impact on the idempotency of 
the execution of the tasks.
   
   It's optional to use such a thing.  Just like it is with XCom.  If you don't 
use it, nothing is changed.
   
   > Why would the manual triggering of a dag introduce issues, the execution 
date will be equal to the moment that it was triggered. I think it should work 
as well.
   
   Because execution_date is run date minus one interval, and `xcom_pull` sorts 
by execution_date.  So, suppose I want to persist state with XCom (which I do 
in many jobs), and I have a daily job, running at midnight.  At end of each 
run, we push some value to XCom.  At start of next job, we retrieve last value 
and use it somehow. Consider this case:
   * run 1: 12am D1
   * run 2: manually triggered at 8am (exec date is D1 8am; xcom retrieves from 
run 1)
   * run 3: 12am D2
   * run 4: 12am D3
   * run 5: 12am D4
   
   Outcome:
   * Run 3 will retrive the XCom from run 1, because its execution date is 
prior to run 2 execution date.
   * Run 4 retrieves run 2 for same reason.
   * Run 5 retrieves run 4 (finally things are back in order); run 3 xcom is 
never retrieved by any job.
   
   The schedule interval edge PR would resolve the execution date ordering 
problem.  But if XCom is cleared at start of task, it is remains problematic as 
a mechanism for state persistence.
   
   > Since this will introduce such as a fundamental change to the way 
operators were intended, being idempotent, I think it would be great to first 
start an AIP on the topic, so we can have a clear and structured approach.
   
   An AIP sounds appropriate.  I am a bit skeptical of the notion that this is 
a radical change; I would be shocked if stateful processes were not already an 
extremely common use pattern.  I suspect the effect of such a change would be 
more to provide better support for an existing use pattern than to change the 
way airflow is used (because it can already be done with XCom, if only 
imperfectly).
   
   Anyway, I have occasionally rambled on dev list about these issues.  I am 
not sure what the best solution is.  I wish there could be clearer and more 
generalized separation between the concepts of "run date" and "interval of 
interest", but I am not sure what that should look like.  But having a simple 
way to persist state would be of great immediate help to me, and to this PR, 
incidentally.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to