potiuk commented on PR #23971: URL: https://github.com/apache/airflow/pull/23971#issuecomment-1168045347
> @potiuk Unfortunately, this PR took a lot of time and I just want to remind you that it was inspired while I worked on #23767 (#23812, #23815, #23816). > > I see that changes in #24476 don't intersect with these. But even though it's better to separate them to make review easier First - general remarrk. I am not debating with your staments and reminders above. Yeah. Things were happening and we were thiking and discussing various ideas while others were being implemented. And from those discussions, it seems that the "being implemented" change is better to be delayed and reworked and likely incorporated in the other work. Unfortunately, this happpens. Sometimes there are ideas that comes after and result in much better outcome that the first one. And it's pretty normal to redo parts of all of the change when we find a better one. This is normal. We aalso should think about the code as liability rather than asset. Sometimes iterating and improving on a PR leads to coming to better idea, which is result of the learnign during a process. There is much bigger value in an idea done wrong rather than with a code that implements a worse solution, even if is already wrritten. BTW. It happened many time for my and other's changes, and if only that results in something that is better for the community - this is perfectly ok, normal and even good. There is also a fallacy of "sunken cost" that you are falling in. The fact that a lot of effort has been invested in something, does not make it any more valuable than alternative solution. If anyone spent a lot of time on something, It does not mean that wee should furher invest in it, if we know we have better solution. The cost is "sunk" already. If the solution is better - we should go for it. If alternative is better - we should go for the alternative. And often that cost is not "totally sunk" - more often than not even tf the code cannot be directly "moved", the learnings from it can and reimplementting same thing using another approach and base takes a fraction of the originally invested time, because the learning/discovery has been already done and the original code can be used as a "blueprint" for reapplying it. If we know better approach the cost that has been done alredy for this PR should not matter. We should simply make better decisions based on the current understanding of a problem. Time spent on doing any solution already shoudl not matter at all - in that decision. Only whether which of the solution is better. Re - 'core sql': We had already a number of problems with DBApi being in core of Airlfow and especially with any efforts there which impacted multiple providers implementing it. We have learned that with the current approach where we have providers released separately from the core, keeping DBApi in "airflow" package has more costs than benefits. The thinking (that was born after you started working on a change and which you change sparked actually - seeing the consequences it cause) is that this approach is not sustainable. For any unification / change that uses current DB Api we will only start reaping any real benefits of in ~ 14 months - i..e about 12 months after 2.4.0 of airflow is released, because only then, any of the code in providers that will only work with the current version of DBApi (2.3 one) can be dropped. And for all 14 months we willl have to live with having to keep backwards compatibility and it will mean that we will not be able to fully utilise the benefits of such unification for more than a year. On the other hand (and this what was the actual driver behind https://github.com/apache/airflow/issues/24422 and https://github.com/apache/airflow/pull/24476 is) - we have a much bigger need to unify the DB API even further. At Airflow Summit we spoke a lot about Lineage and in order to allow column based lineage and generally SQL-lineage, we need to find a better way to make sure that all community-based providers can utilise more "unified" and "fully-featured" DBApi equivalent, that will be able to evolve together with the providers. Thus the idea of 'core.sql' was born. When 'core.sql' is implemented, we will be able to release any DB-API related unifications and changes independently from Airflow releases. This means than 3 months from now we will be able to release all DB providers with new features that anyone will be able to use in Airflow 2.2, 2.3. and that will provide much better lineage integration (and we can deprecate teh DBApi inside Airflow). We will not have to wait 14 months. This is a major win for the community - even if it means that you will have to redo all or parts of your PR. Would you want @kazanzhy to implement your change knowing that it's going to be deprecated almost immediately after and will never be used in reality because we will replace the DB Api with Core.sql provider? Do you think it makes sense? Why would you want to do it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
