Re: [I] Google Summer of Code - Ideas and Coordination [datafusion-python]

via GitHub Fri, 21 Feb 2025 10:27:06 -0800


sidshehria commented on issue #1032:
URL: 
https://github.com/apache/datafusion-python/issues/1032#issuecomment-2675258479


   @timsaucer 
   Thanks for the clarity!
   
   I understand the explanation on the DataFrame API, lazy mode of evaluation, 
and Pandas/Polars integration better. I will refer to the common operations 
documentation and the data sources page more extensively to grasp the current 
implementation in detail.
   
   To optimize PyO3 overhead, I will look into:
   1. Profiling the FFI interface to understand Python-Rust data movement 
bottlenecks.
   2. Researching zero-copy data transfer options to reduce overhead further.
   3. Checking if alternative serialization methods can improve efficiency over 
pyarrow's current approach.
   
   For parallel execution and distributed processing, I'll look into 
datafusion-ray and ballista to understand their current development and 
potential contribution areas.
   
   Would love any pointers on known performance pain points in the PyO3 
interface that could be valuable to address! ?
   
   Thanks again for the guidance!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [I] Google Summer of Code - Ideas and Coordination [datafusion-python]

Reply via email to