westonpace commented on PR #33731:
URL: https://github.com/apache/arrow/pull/33731#issuecomment-1409418527

   > I added total_bytes_allocated and num_allocations, since they are metrics 
available and seem like they would be useful. For example, it might help remind 
of us places where we aren't calling Reserve on array builders when we should 
be. Does that seem sensible?
   
   Yes, it seems like it could be useful in some cases.  Atomic increments 
aren't free but we don't allocate in critical sections so I think it's fine.
   
   > In order to be able to use these metrics for individual benchmarks, we 
needed to expose a ResetStatistics() method. However, this changes the 
semantics of the statistics, since they are no longer guaranteed to represent 
the global state since the beginning of the program. That's fine as long as no 
one was relying on them for an accurate picture of memory use at the time, but 
I do worry it could be a foot-gun. 
   
   As long as no one calls this it should be fine.  It looks like we're only 
calling this in the benchmarks, I think it's ok then.  Maybe don't expose it in 
python/r or maybe you could make it private and make the benchmark utilities a 
friend.
   
   > I added memory tracking only on benchmarks that I felt would benefit from 
it. For example, I excluded flight because AFAIK many of the allocations happen 
in gRPC outside of the memory pool, so this method of measurement wouldn't be 
particularly useful there. Do the ones I chose seem reasonable?
   
   No strong opinions.  Seems like a generally harmless addition.  Pity we 
can't get malloc statistics (maybe we could with jemalloc? but that can be for 
a future PR).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to