alamb commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2890562106
> BTW, I am confused about why so many `page_fault` in the `blocked accumualte`. Me too -- I looked at the flamegraph you provided and I agree it seems like almost half the allocation time is spent with pagefaults / zeroing memory. However, I can't tell if that is because there is slowness with the underlying Vec that wasn't initialized or if there is something else going on.  > * But it maybe not really help much for performance (performance improve mainly due to removal of usage of expansive `slice`) currently. Yes, that was my understanding -- that blocked aggregation would only help performance when the number of intermediate groups was large (which forced additional memory allocations) > But inspired by the `batch_size` based memory allocation, I am thinking can we have some ways to reuse memory? And I am trying it today. I suspect you already know this, but I think you can get back the original Vec from an array via 1. `PrimitiveArray::into_parts()` --> get a `ScalarBuffer` 2. `ScalarBuffer::into_inner()` --> get a `Buffer` 3. `[Buffer::into_vec()](https://docs.rs/arrow/latest/arrow/buffer/struct.Buffer.html#method.into_vec)` However, in the high cardinality case, I am not sure there are buffers to reuse during aggregation (the buffers are all held until the output is needed, and then once output is needed they don't get re-created) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org