Joe McDonnell created IMPALA-12038:
--------------------------------------

             Summary: Switch report_benchmark_results.py to Python 3
                 Key: IMPALA-12038
                 URL: https://issues.apache.org/jira/browse/IMPALA-12038
             Project: IMPALA
          Issue Type: Sub-task
          Components: Infrastructure
    Affects Versions: Impala 4.3.0
            Reporter: Joe McDonnell


report_benchmark_results.py is used by the bin/single_node_perf_run.py script 
(used by the perf-AB-test Jenkins job). The script is comparing the results 
stored in two JSON files. In some configurations (e.g. running TPC-DS with many 
iterations), the JSON files are massive (~4GB). report_benchmark_results.py 
uses massive amounts of memory and can oversubscribe the machine.

Python 2 is substantially less efficient than Python 3 around memory usage for 
this case:
{noformat}
Python 2 as-is:
Memusage: ~30GB, spiking to 43+GB

real    2m35.975s
user    2m14.102s
sys     0m20.922s

Python 3:
Memusage: ~8GB, spiking to 10.5GB
real    2m5.453s
user    1m55.692s
sys     0m8.946s{noformat}
I suspect this may be related to differences in Unicode representation, i.e. 
[https://peps.python.org/pep-0393/]

Independent of the larger python 2 to python 3 migration, we should go ahead 
and migrate this script.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to