jscheffl commented on PR #36029:
URL: https://github.com/apache/airflow/pull/36029#issuecomment-1990936322

   > > "niche" in context we came to version 2.9.0
   > 
   > The feature was actually merged for 2.8 and then reverted due to failing 
tests so it missed the 2.8 cut. Its not a recent new change. I'd rather we make 
an attepmt to solve the pain if we can then wait for 2.10
   > 
   > We can find solutions and warn about possible performance issues like 
having the feature off by default and users must set it explicitly in the 
settings... I am not so worried about this part. We always find ways to make 
everyone happy :)
   
   I was thinking of use cases where persistence is needed and still have 
doubts. The use cases that came into my mind:
   - Any custom logic --> Python code is sufficient
   - Retry cases --> Like example can use the task instance context
   - Context information --> Can use existing XCom
   - Different strategies based on DAG run (e.g. high/low prio executions) --> 
Use DagRun.conf
   - Option to switch the logic --> I'd propose to rather use multiple 
implementations for the variation each static
   - Use a runtime parameter --> Can use existing Variable infrastructure
   - Priority based on time (low prio during day) --> Use standard Python code
   - Priority based on external condition (e.g. weather) --> SHOULD NOT be 
used, big performance risk
   - Some data from other DAGs/Tasks --> You can make a query in the DB, be 
careful what you do not to impact performance
   - Other cases --> If really needed then I'd rather recommend to expend the 
DB Schema for these cases like we have the "task_reschedule" or 
"rendered_task_instance" structures and not put the additional field into every 
tuple
   
   I understand that it was merged and reverted in 2.8 - but this was regarding 
security. Performance concerns stay and this is valid to be experimental. My 
concerns here are mainly performance. Adding millions of Blob into the biggest 
table in the DB Scheme needs some considerations. Otherwise all Airflow users 
need to throw additional hardware on the environment to compensate the overhead 
of this feature. Extending the DB scheme is nothing that is optional for all 
users. It might impact all.
   
   Means: I am totally fine except persistence. (plus the other comment about 
parameter rename)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to