Re: [I] Make PyDataFrame.inner_df() public [datafusion-python]

via GitHub Wed, 28 Jan 2026 06:27:51 -0800


timsaucer commented on issue #1354:
URL: 
https://github.com/apache/datafusion-python/issues/1354#issuecomment-3811627310


   Approach 1: Using df-python
   - Write all of your rust code for UDFs, etc and expose via PyO3 using the 
FFI approach.
   - Register these with df-python
   - Write python style dataframe operations
   - Everything still runs at full native speed during execution with zero copy 
between UDFs and core DF code
   
   Approach 2: Using df-python as a **rust** dependency only
   - We update the method to `pub` as requested
   - You expose all of the specific endpoints you want to use to python, so 
probably something that is taking in that pandas dataframe in your method 
signature
   - Everything is running in Rust and the results are sent back probably as 
arrow record batch or record batch stream
   - You will not get any of the datafusion-python python API (but maybe not an 
issue)
   
   Approach 3: Mixture of the two
   - Write a table provider that takes in as an input that pandas dataframe, 
does all the operations you want under the hood using its own session context 
(not the datafusion-python session context)
   - Follow on operations from the output of the table provider could use 
datafusion-python interfaces


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Make PyDataFrame.inner_df() public [datafusion-python]

Reply via email to