crepererum commented on code in PR #5377:
URL: https://github.com/apache/arrow-datafusion/pull/5377#discussion_r1117926952


##########
datafusion/physical-expr/src/aggregate/count_distinct.rs:
##########
@@ -216,23 +216,19 @@ impl Accumulator for DistinctCountAccumulator {
     }
 
     fn size(&self) -> usize {
+        // temporarily calculating the size approximately, taking first batch 
size * number of batches
+        // such approach has some inaccuracy for variable length values, like 
strings.

Review Comment:
   This is clearly a conflict of interests: you wanna fix the benchmark, I 
wanna have proper memory accounting. We likely can have both on the long run, 
but due to limited development resources, we cannot have the ideal solution 
right now. 
   
   I'll leave it to the project managers (e.g. @alamb) to decide what's more 
pressing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to