Re: [PR] Chunk based iteration in `accumulate_indices` [datafusion]

via GitHub Mon, 18 Nov 2024 00:37:08 -0800


Dandandan commented on code in PR #13451:
URL: https://github.com/apache/datafusion/pull/13451#discussion_r1846086708



##########
datafusion/functions-aggregate-common/src/aggregate/groups_accumulator/accumulate.rs:
##########
@@ -395,19 +395,38 @@ pub fn accumulate_indices<F>(
             }
         }
         (None, Some(filter)) => {
-            assert_eq!(filter.len(), group_indices.len());
-            // The performance with a filter could be improved by
-            // iterating over the filter in chunks, rather than a single
-            // iterator. TODO file a ticket
-            let iter = group_indices.iter().zip(filter.iter());
-            for (&group_index, filter_value) in iter {
-                if let Some(true) = filter_value {
-                    index_fn(group_index)
-                }
-            }
+            debug_assert_eq!(filter.len(), group_indices.len());
+            let group_indices_chunks = group_indices.chunks_exact(64);
+            let bit_chunks = filter.values().bit_chunks();
+            let group_indices_remainder = group_indices_chunks.remainder();
+
+            group_indices_chunks.zip(bit_chunks.iter()).for_each(

Review Comment:
   Yeah it seems it is faster (generating faster code) when valid values are 
high enough.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Chunk based iteration in `accumulate_indices` [datafusion]

Reply via email to