GitHub user potiuk edited a comment on the discussion: Add the ability to 
backfill a DAG based on past Asset Events

> Maybe last question for you: is my usecase so exotic? I'm surprised that the 
> need of processing data updates that happened when a DAG was not created yet 
> is not something common. If it's exotic, then it might hide a mis-usage of 
> Airflow on my side?

We have no idea - you are the first one to ask this question - and to be honest 
It does not matter how "exotic" it is. What matters is whether someone (for 
example you) would willl to make it into a product feature, and whether the 
community will decide to take a maintenance burden for it if contributed.

So I do not know about "exoticness" - I have no data to judge it for now - but 
I am sure that if you want to modify your Dag inter-dependencies  in the past 
and reprocess events from the past in general case, this is rather complex 
"feature" if you want to include our event model and conditional processing. 
And it means that it will cost a LOT to develop it as a feature and it will be 
costly in maintenance. 

But - it's way simpler if you consider a simplified case like yours wher you 
want to modify your complete dependency set (by adding new diagrams) and 
"pretend" they were always there, and when you can "simplify it" because you 
have a small subset of our event feature use. And developing such script that 
will handle your simplified case using the APIs we have is not only possible, 
but also relatively easy (but only if you limit it to your specific case - 
where you limited scope of it heavily - because it's "your case"). 

There is a big difference vs "one-time solution", "reusable solution" and 
"product feature" and there are rule of thumb calculations there:

* one time solution for your case -> developing a script and running it. - 
costs X
* reusable solution you can share with others who have similar case - costs 
roughly 3 x one time solution (so 3X) - say "installable package with 
configurable inputs and docs". 
* then - a product feature that handles a generic solution for multiple edge 
cases - costs roughly 3 x reusable solution - so 9X 

And this is "rule of thumb", and 9X is very conservative for many cases.  It 
only works for really simple cases. Also when you make into a product, there is 
cost of maintainig this solution, running and fixing tests continuously, fixing 
bugs and also the impact it has on developing new features and refactorings of 
the product (Apache Airflow) that it interacts with.

So -> as I suggested from very beginning -> having a one-time-solution done by 
you, is a cheap and easy to test by you and seems like a best option for you. 
Turning that into a "Product feature", you will have to spend a LOT more effort 
- you can think of spending order of magnitude more time on it. Which you of 
course might want to if YOU are convinced it's not an exotic case.  We very 
much welcome proposing new AIPs - even if they are not going to make it 
finally, there is always something to learn from those.

And our AIP (Airflow Improvement Proposal) process is actually designed to 
answer many of those questions. When you make an AIP proposal in devlist you 
wil find out:

* how many people will say "good idea I also need it" 
* how many people will say 'it's actually easy and can be simplified"
* how many people will say "boy' it's hard and difficult and we do not want to 
maintain it"
* or maybe someone proposes way simpler way of doing it 
* or maybe everyone will say "this is crap, don'd do it"
* or maybe you will find out that there are other similar proposals already 
* or maybe you will find out that there is another big feature in the making 
that will make your proposal far more complex

And I am not one to make decisions or judgments there - that's not my role as a 
maintainer. My role is to respond here, try to understand what you are asking 
for and point out to things that are important. There is the whole community at 
`[email protected]` that is far more focused on looking what is sent to 
devlist than individual discussions, writing AIP proposal allows you to 
formulate your thoughts and design better, and those people will be able to 
make way better decision than anyone alone.

So .. I have no idea if your use case is "exotic" - I am not able to answer you 
that question, but you can likely find out by proposing AIP and discussing it 
at the devlist.


GitHub link: 
https://github.com/apache/airflow/discussions/59886#discussioncomment-15398697

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to