jayzhan211 commented on PR #12269:
URL: https://github.com/apache/datafusion/pull/12269#issuecomment-2368451062

   > One idea I had is that you could defer actually copying the new rows into 
group_values so rather than calling the function once for each new group, you 
could call it once per batch, and it could insert all the new values in one 
function call
   
   > That would save some function call overhead as well as the downcasting of 
arrays and maybe would vectorize better
   
   I think the challenge of processing in batch is that if we got multiple same 
row, we should push the first one in group values but reject another n-1 ones 
as duplicated row values. The dependency is not vectorizable, since we need to 
check them iteratively.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to