alamb commented on PR #2177: URL: https://github.com/apache/arrow-datafusion/pull/2177#issuecomment-1113608594
@gandronchik thank you for the explanation in this PR's description. It helps though I will admit I still don't fully understand what is going o. I agree with @doki23 -- I expect a table function to logically return a table (that something with both rows and columns) > Regarding signature, I decided to use a single vector and vector with sizes of sections instead of vec of vecs to have better performance. If we use Vec, this will require a lot of memory in case of a request for millions of rows. The way the rest of DataFusion avoids buffering all the intermediate results at once int memory is with `Stream`s but then that requires interacting with rust's `async` ecosystem which is non trivial If you wanted a streaming solution, that would mean the signature might look something like the following (maybe) ```rust Arc<dyn Fn(Box<dyn SendableRecordBatchStream>) -> Result<Box<dyn SendableRecordBatchStream>> + Send + Sync>; ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
