Re: [I] Google Summer of Code - Ideas and Coordination [datafusion-python]

via GitHub Fri, 21 Feb 2025 08:25:54 -0800


sidshehria commented on issue #1032:
URL: 
https://github.com/apache/datafusion-python/issues/1032#issuecomment-2674996952


   @timsaucer Yes, kind of some solutions I have in my mind Kindly review them,
   
   **1. Higher-Level Abstractions:**
   
   - Introduce a DataFrame-like API that feels more intuitive, similar to 
Pandas/Polars.
   - Expose simplified query execution methods, reducing the need for manual 
SQL queries.
   - Provide a lazy evaluation mode to optimize performance in large-scale data 
operations.
   
   **2. Better Integration with Pandas/Polars:**
   
   - Implement direct conversion utilities between DataFusion and Pandas/Polars 
DataFrames.
   - Improve data type compatibility to ensure smooth interoperability.
   - Support efficient batch processing, leveraging Arrow’s memory format.
   
   **3. Performance Optimizations in the FFI Layer:**
   
   - Reduce overhead in Python-Rust interop using PyO3/maturin optimizations.
   - Optimize data movement between Python and Rust to minimize serialization 
costs.
   - Explore parallel execution to enhance computation speed for large datasets.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [I] Google Summer of Code - Ideas and Coordination [datafusion-python]

Reply via email to