lostmygithubaccount commented on issue #462: URL: https://github.com/apache/arrow-datafusion-python/issues/462#issuecomment-1701865868
> I'm also not sure improving Ibis support and making a better "native" API are conflicting goals. I don't disagree, but I think they are duplicative efforts. My overall concern here is that the DataFusion project will spent a lot of time and energy on re-hashing a lot of these discussions about what a great Python dataframe library should be and this is bad for the ecosystem overall. Ibis's primary innovation is portability, decoupling the dataframe API from the execution engine. While this might not be a big deal for an individual developer who can learn pandas & Polars & Snowpark & PySpark APIs, it represents a major source of siloing and duplication of efforts across the ecosystem. I've heard many cases of data scientists "throwing pandas code overall the wall" for data/ML engineers to rewrite in PySpark. With the Ibis projects (and other open-source projects) we (the Voltron Data we) are hoping to create a more modular and composable ecosystem like this: <img width="1187" alt="image" src="https://github.com/apache/arrow-datafusion-python/assets/54814569/19db26a1-482d-46db-89aa-d14ecb090daa"> I do think it's totally valid for DataFusion to pursue its own Python dataframe API, I just think it would be better to spend effort improving DataFusion as an execution engine and leveraging all the work already done in Ibis for the user-facing API! Very fair concern on how Ibis achieves what it claims -- we're actually in the process of moving our documentation to Quarto and I'd love your feedback on the ["Why Ibis?"](https://ibis-quarto.netlify.app/why) page and ["Backends concept page"](https://ibis-quarto.netlify.app/concepts/backend) that explains it -- in short, SQL dialects and tooling for them (sqlalchemy, sqlglot, etc.) are close enough and manageable enough with the scalable API Ibis exposes. For the most part, Python expressions are compiled to SQL. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
