gopidesupavan commented on PR #62867:
URL: https://github.com/apache/airflow/pull/62867#issuecomment-4184076388

   > > Also, I have a question: why are changes related to DataFusion required 
in the providers that no need to know about the datafusion? As I mentioned 
earlier, there was an initial idea to introduce a separate provider—something 
like apache-airflow-providers-apache-datafusion.
   > > DataFusion isn’t limited to object storage support alone. There are also 
table provider capabilities to consider. Currently, DataFusion supports systems 
like Iceberg, Delta Lake, and Hudi. I’ve also been involved in discussions 
(https://github.com/datafusion-contrib/datafusion-table-providers ) about 
integrating these functionalities more directly part of the datafusion 
provider, currently they dont have python wrappers and eventually i get it 
there.., so that everything works more seamlessly and delivers better 
performance.
   > 
   > If a separate provider were introduced for DataFusion, I would fully agree 
with your approach.
   > 
   > The main concern right now is that the DataFusion implementation imports 
the Google hook. This effectively creates a dependency between both DataFusion 
and the Google provider within common-sql, which feels like an unnecessary 
coupling.
   > 
   > Without that import, the current approach would be fine. However, in its 
current form, it introduces a cross-provider dependency that we should probably 
avoid.
   > 
   > I believe the best solution would be to create a dedicated DataFusion 
provider. That way, it can explicitly depend on both DataFusion itself and the 
Google provider (for the hook), while keeping common-sql free from any 
unintended external dependencies.
   > 
   > But maybe I’m wrong, so curious what others think of this as well.
   
   Yes 😄 , I’m more inclined toward the provider approach. The reason we 
initially started with Common SQL is that we had just begun the Common AI 
provider work, and Apache DataFusion is more naturally aligned with SQL. It 
fits well with the Common SQL pattern and already supports object stores.
   
   So, to introduce some functionality as part of the common.ai provider 
release—especially around object stores—we decided to implement it in 
common-sql provder for now, rather than introducing a DataFusion-based provider 
too early.
   
   so this can revisited later As I mentioned earlier, more features like 
implementing table providers can come afterward. If there’s sufficient 
interest, we can then move or expand this into a dedicated provider. That’s the 
reasoning behind the approach.
   
   @kaxil Please add any pointers or your thoughts :) 
   
   If we have to bring in the separate provider now.. lets create now.. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to