alamb commented on code in PR #6134:
URL: https://github.com/apache/arrow-datafusion/pull/6134#discussion_r1180601182


##########
benchmarks/compare.py:
##########
@@ -64,14 +61,9 @@ def load_from(cls, data: Dict[str, Any]) -> QueryRun:
     def execution_time(self) -> float:
         assert len(self.iterations) >= 1
 
-        # If we don't have enough samples, median() is probably
-        # going to be a worse measure than just an average.
-        if len(self.iterations) < MEAN_THRESHOLD:
-            method = statistics.mean
-        else:
-            method = statistics.median
-
-        return method(iteration.elapsed for iteration in self.iterations)
+        # Use minimum execution time to account for variations / other
+        # things the system was doing
+        return min(iteration.elapsed for iteration in self.iterations)

Review Comment:
   I agree it is misleading -- in terms of measuring a change between 
datafusion versions, I think `min` will give us the least variance between runs 
and represents best case performance.
   
   However, it doesn't really give a sense for how much variation is across 
runs 🤔 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to