timsaucer commented on issue #1032:
URL: 
https://github.com/apache/datafusion-python/issues/1032#issuecomment-2675237730

   For the high level abstractions, I believe these are already met. The 
DataFrame API is available and widely used (in fact, its the only way I 
personally use it). The [common operations online 
documentation](https://datafusion.apache.org/python/user-guide/common-operations/index.html)
 has a handful of sub-pages that describe usage of the API, as well as in the 
[API 
reference](https://datafusion.apache.org/python/autoapi/datafusion/dataframe/index.html#datafusion.dataframe.DataFrame).
   
   DataFusion does already use a lazy evaluation mode.
   
   For the integration with Pandas and Polars, support for this exists and is 
described in the [data 
sources](https://datafusion.apache.org/python/user-guide/data-sources.html) 
page.
   
   For the efficient batch processing leveraging Arrow's memory format, that is 
how DataFusion operates currently.
   
   For the PyO3 interface, I'm not familiar with what optimizations you have in 
mind to reduce overhead. I'd be curious where you think we have issues 
currently. I'd also love to hear if you have ideas about optimizing the data 
movement between Python and Rust. This is a difficult problem, but we do 
already leverage the pyarrow FFI interface to avoid many of the data 
translation inefficiencies. 
   
   Parallel execution is also already supported, but there are additional 
efforts like `datafusion-ray` and `ballista` where we push the envelope much 
further by going into distributed processing. Those are under heavy/active 
development right now and also a very good place to make contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to