potiuk commented on PR #23971:
URL: https://github.com/apache/airflow/pull/23971#issuecomment-1168045347

   > @potiuk Unfortunately, this PR took a lot of time and I just want to 
remind you that it was inspired while I worked on #23767 (#23812, #23815, 
#23816).
   > 
   > I see that changes in #24476 don't intersect with these. But even though 
it's better to separate them to make review easier
   
   First - general remarrk. I am not debating with your staments and reminders 
above. Yeah. Things were happening and we were thiking and discussing various 
ideas while others were being implemented. And from those discussions, it seems 
that the "being implemented" change is better to be delayed and reworked and 
likely incorporated in the other work. 
   
   Unfortunately, this happpens. Sometimes there are ideas that comes after and 
result in much better outcome that the first one. And it's pretty normal to 
redo parts of all of the change when we find a better one. This is normal. We 
aalso should think about the code as liability rather than asset. Sometimes 
iterating and improving on a PR leads to coming to better idea, which is result 
of the learnign during a process. There is much bigger value in an idea done 
wrong rather than with a code that implements a worse solution, even if is 
already wrritten. 
   
   BTW. It happened many time for my and other's changes, and if only that 
results in something that is better for the community - this is perfectly ok, 
normal and even good.
   
   There is also a fallacy of "sunken cost" that you are falling in. The fact 
that a lot of effort has been invested in something, does not make it any more 
valuable than alternative solution. If anyone spent a lot of time on something, 
It does not mean that wee should furher invest in it, if we know we have better 
solution. The cost is "sunk" already. If the solution is better - we should go 
for it. If alternative is better - we should go for the alternative. And often 
that cost is not "totally sunk" - more often than not even tf the code cannot 
be directly "moved", the learnings from it can and reimplementting same thing 
using another approach and base takes a fraction of the originally invested 
time, because the learning/discovery has been already done and the original 
code can be used as a "blueprint" for reapplying it.
   
   If we know better approach the cost that has been done alredy for this PR 
should not matter. We should simply make better decisions based on the current 
understanding of a problem. Time spent on doing any solution already shoudl not 
matter at all - in that decision. Only whether which of the solution is better.
   
   Re - 'core sql':
   
   We had already a number of problems with DBApi being in core of Airlfow and 
especially with any efforts there which impacted multiple providers 
implementing it. We have learned that with the current approach where we have 
providers released separately from the core, keeping DBApi in "airflow" package 
 has more costs than benefits.
   
   The thinking (that was born after you started working on a change and which 
you change sparked actually - seeing the consequences it cause) is that this 
approach is not sustainable. For any unification / change that uses current DB 
Api we will only start reaping any real benefits of in ~ 14 months - i..e about 
12 months after 2.4.0 of airflow is released, because only then, any of the 
code in providers that will only work with the current version of DBApi (2.3 
one) can be dropped. And for all 14 months we willl have to live with having to 
keep backwards compatibility and it will mean that we will not be able to fully 
utilise the benefits of such unification for more than a year.
   
   On the other hand (and this what was the actual driver behind 
https://github.com/apache/airflow/issues/24422  and 
https://github.com/apache/airflow/pull/24476 is) - we have a much bigger need 
to unify the DB API even further. At Airflow Summit we spoke a lot about 
Lineage and in order to allow column based lineage and generally SQL-lineage, 
we need to find a better way to make sure that all community-based providers 
can utilise more "unified" and "fully-featured" DBApi equivalent, that will be 
able to evolve together with the providers. Thus the idea of 'core.sql' was 
born. 
   
   When 'core.sql' is implemented, we will be able to release any DB-API 
related unifications and changes independently from Airflow releases. This 
means than 3 months from now we will be able to release all DB providers with 
new features that anyone will be able to use in Airflow 2.2, 2.3. and that will 
provide much better lineage integration (and we can deprecate teh DBApi inside 
Airflow).  We will not have to wait 14 months. This is a major win for the 
community - even if it means that you will have to redo all or parts of your PR.
   
   Would you want @kazanzhy  to implement your change knowing that it's going 
to be deprecated almost immediately after and will never be used in reality 
because we will replace the DB Api with Core.sql provider? Do you think it 
makes sense? Why would you want to do it?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to