dabla commented on PR #62867:
URL: https://github.com/apache/airflow/pull/62867#issuecomment-4184029714

   > Also, I have a question: why are changes related to DataFusion required in 
the providers that no need to know about the datafusion? As I mentioned 
earlier, there was an initial idea to introduce a separate provider—something 
like apache-airflow-providers-apache-datafusion.
   > 
   > DataFusion isn’t limited to object storage support alone. There are also 
table provider capabilities to consider. Currently, DataFusion supports systems 
like Iceberg, Delta Lake, and Hudi. I’ve also been involved in discussions 
(https://github.com/datafusion-contrib/datafusion-table-providers ) about 
integrating these functionalities more directly part of the datafusion 
provider, currently they dont have python wrappers and eventually i get it 
there.., so that everything works more seamlessly and delivers better 
performance.
   
   If a separate provider were introduced for DataFusion, I would fully agree 
with your approach.
   
   The main concern right now is that the DataFusion implementation imports the 
Google hook. This effectively creates a dependency between both DataFusion and 
the Google provider within common-sql, which feels like an unnecessary coupling.
   
   Without that import, the current approach would be fine. However, in its 
current form, it introduces a cross-provider dependency that we should probably 
avoid.
   
   I believe the best solution would be to create a dedicated DataFusion 
provider. That way, it can explicitly depend on both DataFusion itself and the 
Google provider (for the hook), while keeping common-sql free from any 
unintended external dependencies.
   
   But  maybe I’m wrong, so curious what others think of this as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to