rluvaton commented on code in PR #19526:
URL: https://github.com/apache/datafusion/pull/19526#discussion_r2701276245


##########
datafusion/physical-plan/src/aggregates/group_values/multi_group_by/mod.rs:
##########
@@ -46,11 +46,45 @@ use 
datafusion_execution::memory_pool::proxy::{HashTableAllocExt, VecAllocExt};
 use datafusion_expr::EmitTo;
 use datafusion_physical_expr::binary_map::OutputType;
 
+use datafusion_execution::metrics::{
+    ExecutionPlanMetricsSet, Gauge, MetricBuilder, Time,
+};
 use hashbrown::hash_table::HashTable;
 
 const NON_INLINED_FLAG: u64 = 0x8000000000000000;
 const VALUE_MASK: u64 = 0x7FFFFFFFFFFFFFFF;
 
+pub(crate) struct MultiColumnGroupByMetrics {
+    /// Time spent hashing the grouping columns
+    pub(crate) time_hashing_grouping_columns: Time,
+
+    /// Time spent building and appending the hash table for grouping columns
+    pub(crate) build_hash_table_time: Time,
+
+    /// Track the maximum number of entries that map held
+    ///
+    /// Very large value will probably indicate problems with fetching from 
the hash table
+    pub(crate) maximum_number_of_entries_in_map: Gauge,
+
+    /// Maximum hash map capacity
+    pub(crate) maximum_hash_map_capacity: Gauge,

Review Comment:
   How it grows during execution.
   
   This is different than max number of elements as you can't know from max 
number of elements if the hash map fit in cache and which cache. Because unlike 
vector which only the number of elements is used and you can know if it fit in 
the cache, in hash map it is scattered



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to