westonpace commented on PR #33731: URL: https://github.com/apache/arrow/pull/33731#issuecomment-1409418527
> I added total_bytes_allocated and num_allocations, since they are metrics available and seem like they would be useful. For example, it might help remind of us places where we aren't calling Reserve on array builders when we should be. Does that seem sensible? Yes, it seems like it could be useful in some cases. Atomic increments aren't free but we don't allocate in critical sections so I think it's fine. > In order to be able to use these metrics for individual benchmarks, we needed to expose a ResetStatistics() method. However, this changes the semantics of the statistics, since they are no longer guaranteed to represent the global state since the beginning of the program. That's fine as long as no one was relying on them for an accurate picture of memory use at the time, but I do worry it could be a foot-gun. As long as no one calls this it should be fine. It looks like we're only calling this in the benchmarks, I think it's ok then. Maybe don't expose it in python/r or maybe you could make it private and make the benchmark utilities a friend. > I added memory tracking only on benchmarks that I felt would benefit from it. For example, I excluded flight because AFAIK many of the allocations happen in gRPC outside of the memory pool, so this method of measurement wouldn't be particularly useful there. Do the ones I chose seem reasonable? No strong opinions. Seems like a generally harmless addition. Pity we can't get malloc statistics (maybe we could with jemalloc? but that can be for a future PR). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
