GitHub user potiuk edited a comment on the discussion: Add the ability to backfill a DAG based on past Asset Events
> Maybe last question for you: is my usecase so exotic? I'm surprised that the > need of processing data updates that happened when a DAG was not created yet > is not something common. If it's exotic, then it might hide a mis-usage of > Airflow on my side? We have no idea - you are the first one to ask this question - and to be honest It does not matter how "exotic" it is. What matters is whether someone (for example you) would willl to make it into a product feature, and whether the community will decide to take a maintenance burden for it if contributed. So I do not know about "exoticness" - I have no data to judge it for now - but I am sure that if you want to modify your Dag inter-dependencies in the past and reprocess events from the past in general case, this is rather complex "feature" if you want to include our event model and conditional processing. And it means that it will cost a LOT to develop it as a feature and it will be costly in maintenance. But - it's way simpler if you consider a simplified case like yours wher you want to modify your complete dependency set (by adding new diagrams) and "pretend" they were always there, and when you can "simplify it" because you have a small subset of our event feature use. And developing such script that will handle your simplified case using the APIs we have is not only possible, but also relatively easy (but only if you limit it to your specific case - where you limited scope of it heavily - because it's "your case"). There is a big difference vs "one-time solution", "reusable solution" and "product feature" and there are rule of thumb calculations there: * one time solution for your case -> developing a script and running it. - costs X * reusable solution you can share with others who have similar case - costs roughly 3 x one time solution (so 3X) - say "installable package with configurable inputs and docs". * then - a product feature that handles a generic solution for multiple edge cases - costs roughly 3 x reusable solution - so 9X And this is "rule of thumb", and 9X is very conservative for many cases. It only works for really simple cases. Also when you make into a product, there is cost of maintainig this solution, running and fixing tests continuously, fixing bugs and also the impact it has on developing new features and refactorings of the product (Apache Airflow) that it interacts with. So -> as I suggested from very beginning -> having a one-time-solution done by you, is a cheap and easy to test by you and seems like a best option for you. Turning that into a "Product feature", you will have to spend a LOT more effort - you can think of spending order of magnitude more time on it. Which you of course might want to if YOU are convinced it's not an exotic case. We very much welcome proposing new AIPs - even if they are not going to make it finally, there is always something to learn from those. And our AIP (Airflow Improvement Proposal) process is actually designed to answer many of those questions. When you make an AIP proposal in devlist you wil find out: * how many people will say "good idea I also need it" * how many people will say 'it's actually easy and can be simplified" * how many people will say "boy' it's hard and difficult and we do not want to maintain it" * or maybe someone proposes way simpler way of doing it * or maybe everyone will say "this is crap, don'd do it" * or maybe you will find out that there are other similar proposals already * or maybe you will find out that there is another big feature in the making that will make your proposal far more complex And I am not one to make decisions or judgments there - that's not my role as a maintainer. My role is to respond here, try to understand what you are asking for and point out to things that are important. There is the whole community at `[email protected]` that is far more focused on looking what is sent to devlist than individual discussions in GH, writing AIP proposal allows you to formulate your thoughts and design better, and those people will be able to make way better decision than anyone alone. So .. I have no idea if your use case is "exotic" - I am not able to answer you that question, but you can likely find out by proposing AIP and discussing it at the devlist. GitHub link: https://github.com/apache/airflow/discussions/59886#discussioncomment-15398697 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
