westonpace commented on a change in pull request #12702: URL: https://github.com/apache/arrow/pull/12702#discussion_r834734593
########## File path: cpp/src/arrow/compute/exec/aggregate_node.cc ########## @@ -168,17 +168,26 @@ class ScalarAggregateNode : public ExecNode { Status DoConsume(const ExecBatch& batch, size_t thread_index) { util::tracing::Span span; - START_SPAN(span, "Consume", - {{"aggregate", ToStringExtra()}, - {"node.label", label()}, - {"batch.length", batch.length}}); + START_SPAN( + span, "Consume", + {{"aggregate", ToStringExtra()}, + {"node.label", label()}, + {"batch.length", batch.length}, + {"memory_pool_bytes", + plan_->exec_context()->memory_pool()->bytes_allocated()}, + {"memory_used", arrow::internal::tracing::GetMemoryUsed()}, + {"memory_used_process", arrow::internal::tracing::GetMemoryUsedByProcess()}}); Review comment: Background thread on a fixed schedule was the original ask but I think there was some concern around having two streams of telemetry information. As a start I think only using `memory_pool_bytes` would be sufficient. Longer term it could be handy to have some idea of the RSS of the process just to have some sense for how we are doing on fragmentation and non-pool allocations. Since a system call is expensive I wonder if there would be a way to debounce this information and only report it at most every 10ms or on some configurable threshold. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org