alamb commented on code in PR #6134:
URL: https://github.com/apache/arrow-datafusion/pull/6134#discussion_r1180601182
##########
benchmarks/compare.py:
##########
@@ -64,14 +61,9 @@ def load_from(cls, data: Dict[str, Any]) -> QueryRun:
def execution_time(self) -> float:
assert len(self.iterations) >= 1
- # If we don't have enough samples, median() is probably
- # going to be a worse measure than just an average.
- if len(self.iterations) < MEAN_THRESHOLD:
- method = statistics.mean
- else:
- method = statistics.median
-
- return method(iteration.elapsed for iteration in self.iterations)
+ # Use minimum execution time to account for variations / other
+ # things the system was doing
+ return min(iteration.elapsed for iteration in self.iterations)
Review Comment:
I agree it is misleading -- in terms of measuring a change between
datafusion versions, I think `min` will give us the least variance between runs
and represents best case performance.
However, it doesn't really give a sense for how much variation is across
runs 🤔
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]