sidshehria commented on issue #1032: URL: https://github.com/apache/datafusion-python/issues/1032#issuecomment-2675258479
@timsaucer Thanks for the clarity! I understand the explanation on the DataFrame API, lazy mode of evaluation, and Pandas/Polars integration better. I will refer to the common operations documentation and the data sources page more extensively to grasp the current implementation in detail. To optimize PyO3 overhead, I will look into: 1. Profiling the FFI interface to understand Python-Rust data movement bottlenecks. 2. Researching zero-copy data transfer options to reduce overhead further. 3. Checking if alternative serialization methods can improve efficiency over pyarrow's current approach. For parallel execution and distributed processing, I'll look into datafusion-ray and ballista to understand their current development and potential contribution areas. Would love any pointers on known performance pain points in the PyO3 interface that could be valuable to address! ? Thanks again for the guidance! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org