Joe McDonnell created IMPALA-13781:
--------------------------------------

             Summary: report_benchmark_result.py uses the wrong calculation for 
median diff %
                 Key: IMPALA-13781
                 URL: https://issues.apache.org/jira/browse/IMPALA-13781
             Project: IMPALA
          Issue Type: Bug
          Components: Infrastructure
    Affects Versions: Impala 4.6.0
            Reporter: Joe McDonnell


The benchmark report includes a column with the "Median Diff(%)", but it is 
being calculated improperly. It can produce % reductions greater than 100% 
because it is dividing by the new result rather than the base result:
{noformat}
        # median uses "results", but it should use "ref_results"
        median = results[SORTED][int(len(results[SORTED]) / 2)]
        all_diffs = [x - y for x in results[SORTED] for y in 
ref_results[SORTED]]
        all_diffs.sort()
        self.median_diff = all_diffs[int(len(all_diffs) / 2)] / median{noformat}
In an AB test, the median variable used as the divisor should be the A value 
(i.e. the base / reference value). Instead, this is using the B value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to