Re: [PR] POC: Vectorized hashtable for aggregation [datafusion]

via GitHub Mon, 28 Oct 2024 00:08:21 -0700


Dandandan commented on code in PR #12996:
URL: https://github.com/apache/datafusion/pull/12996#discussion_r1818475481



##########
datafusion/physical-plan/src/aggregates/group_values/group_column.rs:
##########
@@ -287,6 +469,63 @@ where
         };
     }
 
+    fn vectorized_equal_to(

Review Comment:
   At least `take` for `StringArray` is expensive as it needs to copy the 
string data, recheck the offsets, etc. For string / byte views this will 
probably be much better.
   I think some approach for using `take` + `eq` for primitives while using a 
custom one for string arrays will likely be best overal.
   
   Even better would be to write a optimized implementation (an arrow kernel 
that performs equality of two arrays by indices efficiently while applying the 
same tricks done by take / eq kernels). The join implementation can benefit 
from this as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] POC: Vectorized hashtable for aggregation [datafusion]

Reply via email to