alamb commented on issue #13620: URL: https://github.com/apache/datafusion/issues/13620#issuecomment-2512390596
I agree with @Dandandan -- and specifically it isn't clear to me that an iterator based approach will be faster than using the `take` kernel -- I suspect the bottleneck will be the copy that is happening as part of `take` not the actual managment of the indexes If the issue is that the indices themselves take up too much space, then perhaps we can do some more effort to incrementally generate them and reuse the arrays, as suggested by @Dandandan Here is an example in grouping where we reuse indexes: https://github.com/apache/datafusion/blob/8773846859b0390ceb782602efd403e2487d8552/datafusion/physical-plan/src/aggregates/row_hash.rs#L402-L404 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
