lostmygithubaccount commented on issue #462:
URL: 
https://github.com/apache/arrow-datafusion-python/issues/462#issuecomment-1701865868

   > I'm also not sure improving Ibis support and making a better "native" API 
are conflicting goals.
   
   I don't disagree, but I think they are duplicative efforts. My overall 
concern here is that the DataFusion project will spent a lot of time and energy 
on re-hashing a lot of these discussions about what a great Python dataframe 
library should be and this is bad for the ecosystem overall. 
   
   Ibis's primary innovation is portability, decoupling the dataframe API from 
the execution engine. While this might not be a big deal for an individual 
developer who can learn pandas & Polars & Snowpark & PySpark APIs, it 
represents a major source of siloing and duplication of efforts across the 
ecosystem. I've heard many cases of data scientists "throwing pandas code 
overall the wall" for data/ML engineers to rewrite in PySpark. With the Ibis 
projects (and other open-source projects) we (the Voltron Data we) are hoping 
to create a more modular and composable ecosystem like this:
   
   <img width="1187" alt="image" 
src="https://github.com/apache/arrow-datafusion-python/assets/54814569/19db26a1-482d-46db-89aa-d14ecb090daa";>
   
   I do think it's totally valid for DataFusion to pursue its own Python 
dataframe API, I just think it would be better to spend effort improving 
DataFusion as an execution engine and leveraging all the work already done in 
Ibis for the user-facing API! 
   
   Very fair concern on how Ibis achieves what it claims -- we're actually in 
the process of moving our documentation to Quarto and I'd love your feedback on 
the ["Why Ibis?"](https://ibis-quarto.netlify.app/why) page and ["Backends 
concept page"](https://ibis-quarto.netlify.app/concepts/backend) that explains 
it -- in short, SQL dialects and tooling for them (sqlalchemy, sqlglot, etc.) 
are close enough and manageable enough with the scalable API Ibis exposes. For 
the most part, Python expressions are compiled to SQL. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to