jayzhan211 commented on PR #12269: URL: https://github.com/apache/datafusion/pull/12269#issuecomment-2368451062
> One idea I had is that you could defer actually copying the new rows into group_values so rather than calling the function once for each new group, you could call it once per batch, and it could insert all the new values in one function call > That would save some function call overhead as well as the downcasting of arrays and maybe would vectorize better I think the challenge of processing in batch is that if we got multiple same row, we should push the first one in group values but reject another n-1 ones as duplicated row values. The dependency is not vectorizable, since we need to check them iteratively. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
