timsaucer commented on PR #21030: URL: https://github.com/apache/datafusion/pull/21030#issuecomment-4092124577
One of the things I've been thinking about here is doing some scale testing of performance, which I haven't done on the FFI crate really. I was thinking we could do something along the lines of using https://github.com/datafusion-contrib/datafusion-tpch to generate table providers at different scale factors. Then it would seem we could have a series of tests: 1. Pure rust with no FFI work. 2. Pure rust but using two modules and passing table provider via FFI. 3. Expose table provider to python and test with datafusion-python. The thing I like about doing this is that we would be able to see the impacts of each of the layers between the code, ideally going from 2->3 having near zero impact. For such a test I would think about setting up a stream, reading in and dumping the data as fast as possible. Since this is orthogonal to the actual FFI work you're proposing I might try setting this up on a test repo. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
