Hello, I have noticed that our Python tests have been increasing in duration recently. In order to analyse this further, I created the PR [1] which allows to record test durations. Please note that I did not dive deep on these numbers and that they have to be taken with a grain of salt since slaves have varying resource utilizations.
Please have a look at the two following logs: Python3 CPU MKLDNN: http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/unix-cpu/branches/PR-13377/runs/2/nodes/155/steps/409/log/?start=0 Python3 CPU Openblas: http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/unix-cpu/branches/PR-13377/runs/2/nodes/152/steps/398/log/?start=0 If you scroll to the end (note that there are multiple test stages and summaries being printed in these logs), you will find the following statements: Python3 CPU MKLDNN: "Ran 702 tests in 3042.102s" Python3 CPU Openblas: "Ran 702 tests in 2158.458s" This shows that the MKLDNN is generally being about 40% slower than the Openblas backend. If we go into the details, we can see that some tests are significantly slower: Python3 CPU MKLDNN: >[success] 20.78% test_random.test_shuffle: 630.7165s >[success] 17.79% test_sparse_operator.test_elemwise_binary_ops: 540.0487s >[success] 10.91% test_gluon_model_zoo.test_models: 331.1503s >[success] 2.62% test_operator.test_broadcast_binary_op: 79.4556s >[success] 2.45% test_operator.test_pick: 74.4041s >[success] 2.39% test_metric_perf.test_metric_performance: 72.5445s >[success] 2.38% test_random.test_negative_binomial_generator: 72.1751s >[success] 1.84% test_operator.test_psroipooling: 55.9432s >[success] 1.78% test_random.test_poisson_generator: 54.0104s >[success] 1.72% test_gluon.test_slice_pooling2d_slice_pooling2d: 52.3447s >[success] 1.60% test_contrib_control_flow.test_cond: 48.6977s >[success] 1.41% test_random.test_random: 42.8712s >[success] 1.03% test_operator.test_layer_norm: 31.1242s Python3 CPU Openblas: > [success] 26.20% test_gluon_model_zoo.test_models: 563.3366s > [success] 4.34% test_random.test_shuffle: 93.3157s > [success] 4.31% test_random.test_negative_binomial_generator: 92.6899s > [success] 3.78% test_sparse_operator.test_elemwise_binary_ops: 81.2048s > [success] 3.30% test_operator.test_psroipooling: 70.9090s > [success] 3.20% test_random.test_poisson_generator: 68.7500s > [success] 3.10% test_metric_perf.test_metric_performance: 66.6085s > [success] 2.79% test_operator.test_layer_norm: 59.9566s > [success] 2.66% test_gluon.test_slice_pooling2d_slice_pooling2d: 57.1887s > [success] 2.62% test_operator.test_pick: 56.2312s > [success] 2.60% test_random.test_random: 55.8920s > [success] 2.19% test_operator.test_broadcast_binary_op: 47.1879s > [success] 0.96% test_contrib_control_flow.test_cond: 20.6908s Tests worth noting: - test_random.test_shuffle: 700% increase - but I don't know how this may be related to MKLDNN. Are we doing random number generation in either of those backends? - test_sparse_operator.test_elemwise_binary_ops: 700% increase - test_gluon_model_zoo.test_models: 40% decrease - that's awesome and to be expect :) - test_operator.test_broadcast_binary_op: 80% increase - test_contrib_control_flow.test_cond: 250% increase - test_operator.test_layer_norm: 50% decrease - nice! As I have stated previously, these numbers might not mean anything since the CI is not a benchmarking environment (sorry if these are false negatives), but I thought it might be worth mentioning so Intel could follow up and dive deeper. Does anybody here create 1:1 operator comparisons (e.g. running layer_norm in the different backends to compare the performance) who could provide us with those numbers? Best regards, Marco [1]: https://github.com/apache/incubator-mxnet/pull/13377