magarick commented on issue #462:
URL: 
https://github.com/apache/arrow-datafusion-python/issues/462#issuecomment-1696476891

   Hi Cody! Thanks for your interest in this. I've seen a little bit of Ibis 
and it looks interesting. I'm also not sure improving Ibis support and making a 
better "native" API are conflicting goals.
   
   > My worry here is that you and the DataFusion community are going to go 
through the struggles that every new Python dataframe library does, and start 
facing the same type of questions -- is it `groupby` or `group_by`? `to_csv` or 
`write_csv`? The list goes on. I've seen the fragmentation of the Python data 
community over the years and would be far more excited to work on a standard 
API that supports many backends (Ibis) than bringing another Python dataframe 
library to the table.
   
   These differences, at least as you've described them here, seems more like a 
mild annoyance than a struggle to me. As long as there's reasonable 
documentation, I've never found slightly different names to be nearly as big a 
barrier as identical or similarly named things behaving differently, or 
differing capabilities across libraries.
   
   > We'd love to have more collaboration on Ibis for the DataFusion backend if 
that'd be an interesting direction to you and others. Ibis was created by Wes 
McKinney (creator of pandas) and taken an opinionated stance on most issues I 
suspect you'll face with a new dataframe library. Plus, it takes heavy 
inspiration from R and other previous tools! Let us know if this would be 
interesting to you.
   
   I'm not opposed to this at all, especially if Ibis can provide a consistent 
API while still exposing the full power of each underlying library. At some 
point, though, it seems like you'll encounter differences that preclude a 
uniform interface or require a specialized API for a unique feature Ibis 
doesn't support. However, I can see the appeal if you have people who 
occasionally use a large number of backends or are trying to build something 
that can interact with multiple systems. So I'd be surprised if there weren't 
value to both a native interface that exposed all of a tool's power and a 
universal interface since they seem to be solving different problems. If I'm 
wrong about Ibis' goals and capabilities, please do correct me though.
   
   > It's already integrated with visualization frameworks (Altair, Plotly, 
Streamlit -- any that support the `__dataframe__` protocol natively, and any 
others through `to_pandas()`) and ML frameworks (scikit-learn, XGBoost, more in 
this area coming soon).
   
   Glad that you brought this up. What's the relationship between Ibis and the 
Python dataframe standards protocol?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to