I ran operator benchmarks for GPU with CUDA10.1 comparing MXNet 1.4.1 and
1.5.0 RC2
Results -
https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50#file-mxnet_opperf_gpu-md

GPU operator benchmark Summary:
1. Most ops are mostly stable across MXNet 1.4.1 and 1.5.0.RC2. No
regressions.
2. Nice improvements on Dropout and FC backward.

As shared before on CPU -
https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50#file-mxnet_opperf_cpu-md,
individual operator performance, comparing MXNet 1.4.1 and 1.5.0.RC2:

CPU operator benchmark Summary:
1. Many operators have improved very well - dropout, Convolution, reduction
ops  like max, mean, min, prod, sum
2. Many operators have regressed - batchnorm, pooling, batch_dot with
transpose, all broadcast_* ops

Best,
Sandeep


On Fri, Jun 28, 2019 at 10:46 AM Davydenko, Denis <
[email protected]> wrote:

> Just to re-iterate, postponing the release until we have a strong hold on
> perf regression is my #1 choice as well. I am just trying to consider
> alternatives where we can release 1.5.0 and manage potential perf impact...
>
> On 6/28/19, 10:04 AM, "Marco de Abreu" <[email protected]> wrote:
>
>     Hey Denis,
>
>     I don't think something like an experimental release is something that
> the
>     Apache release process supports. Also, I would be afraid of automated
>     systems consuming MXNet by simply fetching the latest release version.
>     These users would then get the experimental version without being
> aware.
>
>     For the sake of the best user experience, I'd prefer if we could take
> a few
>     days to track down the root causes for all these regressions. While I
> agree
>     that releasing the new features and optimizations is certainly
> overdue, I
>     think that the most important point is to keep up with the existing
> users
>     and their trust. If a new release performs worse for the same kind of
>     workload, they might lose trust into our release process and in future
>     might be less willing to adopt a new release early-on.
>
>     -Marco
>
>     Davydenko, Denis <[email protected]> schrieb am Fr., 28.
> Juni
>     2019, 18:55:
>
>     > According to Sandeep's evaluation of perf regression on operator
> level [1]
>     > we have 77 op/input combinations for forward pass and 50 for
> backward pass
>     > where regression is 5%+ (biggest regressions observed are about 86%
> and 84%
>     > respectively) out of 290 tests. If I raise threshold of degradation
> to 10%+
>     > corresponding numbers are 70 for forward and 42 for backward. This,
> from my
>     > perspective, constitutes significant scale performance impact, at
> least on
>     > individual operator level. In light of keeping every next release as
>     > performant as previous (at least to feasible extent) I suggest we
> can only
>     > move forward with 1.5.0 release if we call it experimental. Current
>     > landscape of operators having potentially negative performance
> impact on
>     > customers could (and I consider it will) put MXNet one step behind
> its
>     > current market position of being a choice for performance optimized
> DL
>     > workloads. Tagging it as experimental, from my point of view, would
> help to
>     > release new features so that customers could enjoy them while being
>     > explicit about performance optimizations going on.
>     >
>     > [1]
>     >
> https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50
>     >
>     >
>     >
>     > On 6/28/19, 9:38 AM, "Lai Wei" <[email protected]> wrote:
>     >
>     >     Hi,
>     >
>     >     Some more data points:
>     >
>     >     I ran the same cifar10.py scripts with same setup, BUT added a
> fixed
>     > seed
>     >
>     >     Ran 50 epochs, and first 10 epoch as warmup.
>     >     I have the following average time per epoch:
>     >     1.4.1: 164.95 s
>     >     1.5.0: 170.44 s
>     >     Detailed data at [1]
>     >     This is about 3% regression, less than Manu’s result but more
> close to
>     > the
>     >     Gluon result.
>     >
>     >     As for the operator benchmarks from Sandeep[2],  I have
> calculated the
>     >     percentage of speed increase/regression here[1]. Looks like not
> all
>     >     operators mentioned before slowed down. should it be treated as
> an
>     > separate
>     >     issue as it’s testing on fake data with different shape than
> CIFAR10
>     >     dataset? For example, batch norm has no regression in the report
> but
>     > it’s
>     >     slowed down in cifar10.py script profiling.
>     >
>     >     [1]
> https://gist.github.com/roywei/41fce930f013ff3b54cda6e86eaaf66b
>     >     [2]
>     >
>     >
> https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50
>     >
>     >
>     >     On Fri, Jun 28, 2019 at 2:47 PM Pedro Larroy <
>     > [email protected]>
>     >     wrote:
>     >
>     >     > Thanks Manu.
>     >     >
>     >     > @all: I observed other strange stuff that I don't understand
> at the
>     > moment:
>     >     >
>     >     > I installed rc for 1.5 from pip to check that I'm not doing
> something
>     >     > wrong when building. And I found out that the usage of CPU is
> quite
>     >     > subpar ( https://imgur.com/fRmbQNc ) compared to a version
> compiled
>     >     > from source. The pip package is using 4-5 cores of the 32.
> When I
>     >     > compile from source I get good core utilization. (
>     >     > https://imgur.com/e8BB425 ). I verified this also on a
> c5d.18xlarge
>     >     > and a 32 core AMD bare metal machine.
>     >     >
>     >     > Seems to me also that the version from pip is using gomp
> instead of
>     >     > llvm's omp. I'm not sure why.
>     >     >
>     >     > pip install mxnet==1.5.0b20190627
>     >     > /home/piotr/py3_1.5rc/lib/python3.6/site-packages/mxnet
>     >     > piotr@panther:0: ~/p/l/p/s/mxnet> ldd libmxnet.so | grep omp
>     >     >     libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1
>     >     > (0x00007f99d1832000)
>     >     >
>     >     > I tried cifar10 on a bare metal 32 core AMD Zen machine and is
>     >     > extremely slow, doesn't seem to make much progress, when
> compared to
>     > a
>     >     > c5d.18xlarge, I couldn't even do 1 epoch, tried with and
> without MKL
>     >     > without much success. Will continue digging into this when
> possible.
>     >     >
>     >     >
>     >     > Pedro.
>     >     >
>     >     > On Thu, Jun 27, 2019 at 9:41 PM Manu Seth <
> [email protected]>
>     > wrote:
>     >     > >
>     >     > > Hi all,
>     >     > >
>     >     > > I ran the same cifar10.py script as Pedro, but for 20 epochs.
>     > Considering
>     >     > > the first 10 epochs for warm-up, I averaged time per epoch
> for the
>     > last
>     >     > 10
>     >     > > epochs.
>     >     > >
>     >     > > With MXNet 1.4.1 average time is 164.23 s
>     >     > > With MXNet 1.5.0 average time is 174.59 s (~6.3% regression)
>     >     > >
>     >     > >
>     >     > > For a second data point, I ran Gluon speed test benchmark
> script -
>     >     > >
>     >     >
>     >
> https://github.com/apache/incubator-mxnet/blob/master/benchmark/python/gluon/benchmark_gluon.py
>     >     > > using the following command:
>     >     > > python3 benchmark_gluon.py --model 'resnet152_v2'
> --batch-size 128
>     >     > > --num-batches 200 --type 'training'
>     >     > >
>     >     > > I got the following speeds:
>     >     > > With MXNet 1.4.1, average speed is 25.677534 img/s
>     >     > > With MXNet 1.5.0, average speed is 25.082130 img/s (~2.3%
>     > regression)
>     >     > >
>     >     > > Note:
>     >     > > For 1.4.1 version, I used pip install mxnet-mkl==1.4.1
>     >     > > For 1.5.0 version, I used pip install
> mxnet-mkl==1.5.0b20190619
>     > which
>     >     > > corresponds to commit#
> ccbbf6b4b76ea536a6583c99497c83b65a20817b
>     > which is
>     >     > > behind 1.5.x branch by 4 commits
>     >     > >
>     >     > >
>     >     > > Best,
>     >     > > Manu
>     >     > >
>     >     > >
>     >     > > On 6/27/19, 3:37 PM, "sandeep krishnamurthy" <
>     >     > [email protected]>
>     >     > > wrote:
>     >     > >
>     >     > >     Hello Ciyong/Pedro,
>     >     > >
>     >     > >     Ran operator benchmarks on 1.4.1 and 1.5.0.rc2. (Not
> complete,
>     >     > doesn’t
>     >     > >     cover all MXNet operators, not presented in best
> possible way,
>     > still
>     >     > > WIP)
>     >     > >
>     >     > >
>     >     >
>     >
> https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50
>     >     > >
>     >     > >     Following operators looks slower in 1.5 compared to
> 1.4.1:
>     >     > >     - BatchNorm
>     >     > >     - Pooling
>     >     > >     - FullyConnected
>     >     > >     - batch_dot
>     >     > >     - Dot
>     >     > >     - broadcast_mul
>     >     > >     - log_softmax
>     >     > >     and few other operators
>     >     > >
>     >     > >     Also, several operators runs a lot faster on 1.5
> compared to
>     > 1.4.1.
>     >     > For
>     >     > >     example - Convolution, flatten, elementwise operators
> etc. So
>     > I see
>     >     > that
>     >     > >     likely few operators have regressed noticeably, however,
> due
>     > to other
>     >     > >     operator performance improvements, the end effect is not
> that
>     >     > > significant
>     >     > >     hiding a lot of regression. We need more detailed
> analysis per
>     >     > operator
>     >     > >     performance. We will not be able to do this for current
>     > release, we
>     >     > > should
>     >     > >     have a more concrete way to determining such performance
>     > regression
>     >     > > before
>     >     > >     next release.
>     >     > >
>     >     > >     Setup:
>     >     > >     1.5 => Build from source (head of 1.5.rc2 tag), built
> with
>     > MKLDNN
>     >     > >     1.4.1 => PyPi mxnet-mkl==1.4.1
>     >     > >     Machine: C5.18X
>     >     > >     No explicit environment variable were set
>     >     > >     Operator benchmark code -
>     >     > >
>     >     >
>     >
> https://github.com/apache/incubator-mxnet/tree/master/benchmark/opperf
>     >     > >
>     >     > >     Best,
>     >     > >     Sandeep
>     >     > >
>     >     > >
>     >     > >     On Thu, Jun 27, 2019 at 10:42 AM Pedro Larroy <
>     >     > > [email protected]>
>     >     > >     wrote:
>     >     > >
>     >     > >     > I will try to run a few benchmarks in a bare metal
> instance
>     >     > tonight to
>     >     > >     > remove virtualization variance for the measurements and
>     > provide
>     >     > some
>     >     > >     > numbers.
>     >     > >     >
>     >     > >     > Please propose a set of models / examples that would be
>     > desirable
>     >     > to
>     >     > >     > run before the release and provide a link to an easy
> to run
>     > script
>     >     > >     > with instructions so we can validate the release
> better.
>     >     > >     >
>     >     > >     > Thank you.
>     >     > >     >
>     >     > >     > On Thu, Jun 27, 2019 at 10:01 AM Lai Wei <
>     > [email protected]>
>     >     > wrote:
>     >     > >     > >
>     >     > >     > > Dear @dev,
>     >     > >     > >
>     >     > >     > > I m cancelling the vote for cached op fix:
>     >     > >     > >
>     >     > >     > > https://github.com/apache/incubator-mxnet/pull/15298
>     >     > >     > >
>     >     > >     > > As for the possible cpu training regression, it
> looks like
>     > not a
>     >     > > blocker
>     >     > >     > > for now.
>     >     > >     > >
>     >     > >     > > I will start a new rc2 vote, please help to validate.
>     >     > >     > >
>     >     > >     > > Thanks!
>     >     > >     > >
>     >     > >     > >
>     >     > >     > > On Thu, Jun 27, 2019 at 10:06 PM Chen, Ciyong <
>     >     > [email protected]
>     >     > > >
>     >     > >     > wrote:
>     >     > >     > >
>     >     > >     > > > Hi Pedro,
>     >     > >     > > >
>     >     > >     > > > I was able to reproduced the similar result (v1.5
> is
>     > ~%5.6
>     >     > slower
>     >     > > than
>     >     > >     > > > v1.4, I was using 18 cores for computing) with your
>     > script on
>     >     > >     > C5.18xlarge.
>     >     > >     > > > But need to bind the cores with below command when
>     > running the
>     >     > > script,
>     >     > >     > > > (without setting the env variables, I got a close
> time
>     > (<1%)
>     >     > with
>     >     > > v1.5
>     >     > >     > and
>     >     > >     > > > v1.4)
>     >     > >     > > >         export
>     >     > > KMP_AFFINITY=granularity=fine,noduplicates,compact,1,0
>     >     > >     > > >         export OMP_NUM_THREADS=18
>     >     > >     > > >
>     >     > >     > > > Did you set any env variables during running?
>     >     > >     > > >
>     >     > >     > > > The performance result I got as below:
>     >     > >     > > > 1) 1.4.1.rc0
> (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
>     >     > >     > > > real    12m10.856s
>     >     > >     > > > user    234m49.576s
>     >     > >     > > > sys     4m38.044s
>     >     > >     > > >
>     >     > >     > > > 2) 1.5.0.rc1
> (4d9667121ae6fb643f2a02ab15e25231ed756cde)
>     >     > >     > > > real    12m52.140s
>     >     > >     > > > user    246m30.740s
>     >     > >     > > > sys     5m8.188s
>     >     > >     > > >
>     >     > >     > > > As I looked at the profiling data, most of the ops
> have
>     > same
>     >     > perf
>     >     > >     > between
>     >     > >     > > > v1.4 and v1.5. But some ops like "
> _backward_BatchNorm"
>     > and
>     >     > > "Pooling"
>     >     > >     > is
>     >     > >     > > > ~1.37x slower on v1.5 compared with v1.4.
>     >     > >     > > > Will do further analysis on these ops.
>     >     > >     > > >
>     >     > >     > > > Here's the hardware/OS info from my side:
>     >     > >     > > > ----------Python Info----------
>     >     > >     > > > Version      : 3.6.8
>     >     > >     > > > Compiler     : GCC 7.3.0
>     >     > >     > > > Build        : ('default', 'Dec 30 2018 01:22:34')
>     >     > >     > > > Arch         : ('64bit', '')
>     >     > >     > > > ------------Pip Info-----------
>     >     > >     > > > Version      : 19.0.3
>     >     > >     > > > Directory    :
>     >     > >     > > >
>     >     > >
>     >
> /home/ubuntu/anaconda3/envs/perf-mxnet/lib/python3.6/site-packages/pip
>     >     > >     > > > ----------MXNet Info-----------
>     >     > >     > > > Version      : 1.5.0
>     >     > >     > > > Directory    :
>     > /home/ubuntu/ws/incubator-mxnet/python/mxnet
>     >     > >     > > > Hashtag not found. Not installed from pre-built
> package.
>     >     > >     > > > ----------System Info----------
>     >     > >     > > > Platform     :
>     >     > Linux-4.4.0-1085-aws-x86_64-with-debian-stretch-sid
>     >     > >     > > > system       : Linux
>     >     > >     > > > node         : ip-172-31-32-129
>     >     > >     > > > release      : 4.4.0-1085-aws
>     >     > >     > > > version      : #96-Ubuntu SMP Tue Jun 11 09:08:32
> UTC
>     > 2019
>     >     > >     > > > ----------Hardware Info----------
>     >     > >     > > > machine      : x86_64
>     >     > >     > > > processor    : x86_64
>     >     > >     > > > Architecture:          x86_64
>     >     > >     > > > CPU op-mode(s):        32-bit, 64-bit
>     >     > >     > > > Byte Order:            Little Endian
>     >     > >     > > > CPU(s):                72
>     >     > >     > > > On-line CPU(s) list:   0-71
>     >     > >     > > > Thread(s) per core:    2
>     >     > >     > > > Core(s) per socket:    18
>     >     > >     > > > Socket(s):             2
>     >     > >     > > > NUMA node(s):          2
>     >     > >     > > > Vendor ID:             GenuineIntel
>     >     > >     > > > CPU family:            6
>     >     > >     > > > Model:                 85
>     >     > >     > > > Model name:            Intel(R) Xeon(R) Platinum
> 8124M
>     > CPU @
>     >     > > 3.00GHz
>     >     > >     > > > Stepping:              3
>     >     > >     > > > CPU MHz:               3000.000
>     >     > >     > > > BogoMIPS:              6000.00
>     >     > >     > > > Hypervisor vendor:     KVM
>     >     > >     > > > Virtualization type:   full
>     >     > >     > > > L1d cache:             32K
>     >     > >     > > > L1i cache:             32K
>     >     > >     > > > L2 cache:              1024K
>     >     > >     > > > L3 cache:              25344K
>     >     > >     > > > NUMA node0 CPU(s):     0-17,36-53
>     >     > >     > > > NUMA node1 CPU(s):     18-35,54-71
>     >     > >     > > > Flags:                 fpu vme de pse tsc msr pae
> mce
>     > cx8 apic
>     >     > > sep mtrr
>     >     > >     > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2
> ss ht
>     > syscall
>     >     > nx
>     >     > >     > pdpe1gb
>     >     > >     > > > rdtscp lm constant_tsc arch_perfmon rep_good nopl
>     > xtopology
>     >     > > nonstop_tsc
>     >     > >     > > > aperfmperf tsc_known_freq pni pclmulqdq monitor
> ssse3
>     > fma cx16
>     >     > > pcid
>     >     > >     > sse4_1
>     >     > >     > > > sse4_2 x2apic movbe popcnt tsc_deadline_timer aes
> xsave
>     > avx
>     >     > f16c
>     >     > > rdrand
>     >     > >     > > > hypervisor lahf_lm abm 3dnowprefetch invpcid_single
>     > kaiser
>     >     > > fsgsbase
>     >     > >     > > > tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid
> rtm mpx
>     > avx512f
>     >     > > rdseed
>     >     > >     > adx
>     >     > >     > > > smap clflushopt clwb avx512cd xsaveopt xsavec
> xgetbv1
>     > ida arat
>     >     > pku
>     >     > >     > > > ----------Network Test----------
>     >     > >     > > >
>     >     > >     > > >
>     >     > >     > > > -Ciyong
>     >     > >     > > >
>     >     > >     > > >
>     >     > >     > > > -----Original Message-----
>     >     > >     > > > From: Zhao, Patric [mailto:[email protected]]
>     >     > >     > > > Sent: Thursday, June 27, 2019 9:55 AM
>     >     > >     > > > To: [email protected]
>     >     > >     > > > Cc: [email protected]
>     >     > >     > > > Subject: RE: [VOTE] Release Apache MXNet
> (incubating)
>     > version
>     >     > > 1.5.0.rc1
>     >     > >     > > >
>     >     > >     > > > Could we run more epochs to see the performance
>     > difference or
>     >     > > profiling
>     >     > >     > > > the difference between good and bad run?
>     >     > >     > > >
>     >     > >     > > > > -----Original Message-----
>     >     > >     > > > > From: Pedro Larroy [mailto:
>     > [email protected]]
>     >     > >     > > > > Sent: Thursday, June 27, 2019 9:35 AM
>     >     > >     > > > > To: [email protected]
>     >     > >     > > > > Cc: [email protected]
>     >     > >     > > > > Subject: Re: [VOTE] Release Apache MXNet
> (incubating)
>     > version
>     >     > >     > > > > 1.5.0.rc1
>     >     > >     > > > >
>     >     > >     > > > > I run again and the gap is again bigger, I guess
> we
>     > need to
>     >     > > average
>     >     > >     > > > > out the times across several runs:
>     >     > >     > > > >
>     >     > >     > > > > piotr@ip-172-31-63-171
>     > :0:~/deeplearning-benchmark/dawnbench
>     >     > >     > > > > (master)+$ time ~/mxnet_1.4/py3_venv/bin/python
>     > cifar10.py
>     >     > > --epochs 5
>     >     > >     > > > > && time ~/mxnet_1.5/py3_venv/bin/python
> cifar10.py
>     > --epochs 5
>     >     > >     > > > > [23:17:09]
> ../src/io/iter_image_recordio_2.cc:172:
>     >     > >     > > > > ImageRecordIOParser2:
>     >     > >     > > > >
>     > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use
>     >     > 4
>     >     > >     > threads
>     >     > >     > > > > for decoding..
>     >     > >     > > > > [23:17:09]
> ../src/io/iter_image_recordio_2.cc:230:
>     > Load mean
>     >     > > image
>     >     > >     > > > > from
>     > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     >     > >     > > > > [23:17:09]
> ../src/io/iter_image_recordio_2.cc:248:
>     > Load mean
>     >     > > image
>     >     > >     > > > > from
>     > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     >     > > completed
>     >     > >     > > > > [23:17:09]
> ../src/io/iter_image_recordio_2.cc:172:
>     >     > >     > > > > ImageRecordIOParser2:
>     >     > >     > > > >
>     > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
>     >     > > threads
>     >     > >     > > > > for decoding..
>     >     > >     > > > > [23:17:09]
> ../src/io/iter_image_recordio_2.cc:230:
>     > Load mean
>     >     > > image
>     >     > >     > > > > from
>     > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     >     > >     > > > > [23:17:09]
> ../src/io/iter_image_recordio_2.cc:248:
>     > Load mean
>     >     > > image
>     >     > >     > > > > from
>     > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     >     > > completed
>     >     > >     > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001,
> 123:
>     > 0.0005,
>     >     > > 300:
>     >     > >     > > > > 0.0001} Epoch 0, Changed learning rate to 0.05
>     > [23:17:09]
>     >     > >     > > > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
> Allocate
>     >     > >     > > > > 147456 bytes with malloc directly
>     >     > >     > > > > [23:17:09]
> ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
>     >     > Allocate
>     >     > >     > > > > 589824 bytes with malloc directly
>     >     > >     > > > > [23:17:09]
> ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
>     >     > Allocate
>     >     > >     > > > > 2359296 bytes with malloc directly
>     >     > >     > > > > [23:17:09]
> ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
>     >     > Allocate
>     >     > >     > > > > 9437184 bytes with malloc directly
>     >     > >     > > > > Epoch 0, Batch 199, Speed=384.149839
>     >     > >     > > > > Epoch 0, Duration=140.919567
>     >     > >     > > > > Epoch 0, Training accuracy=0.115169
>     >     > >     > > > > Epoch 0, Validation accuracy=0.141317
>     >     > >     > > > > Epoch 1, Batch 199, Speed=433.380512
>     >     > >     > > > > Epoch 1, Duration=119.553233
>     >     > >     > > > > Epoch 1, Training accuracy=0.170956
>     >     > >     > > > > Epoch 1, Validation accuracy=0.216146
>     >     > >     > > > > Epoch 2, Batch 199, Speed=434.864699
>     >     > >     > > > > Epoch 2, Duration=123.278490
>     >     > >     > > > > Epoch 2, Training accuracy=0.209455
>     >     > >     > > > > Epoch 2, Validation accuracy=0.247296
>     >     > >     > > > > Epoch 3, Batch 199, Speed=433.401854
>     >     > >     > > > > Epoch 3, Duration=118.327797
>     >     > >     > > > > Epoch 3, Training accuracy=0.248701
>     >     > >     > > > > Epoch 3, Validation accuracy=0.302083
>     >     > >     > > > > Epoch 4, Batch 199, Speed=419.713707
>     >     > >     > > > > Epoch 4, Duration=126.468409
>     >     > >     > > > > Epoch 4, Training accuracy=0.260949
>     >     > >     > > > > Epoch 4, Validation accuracy=0.269030
>     >     > >     > > > >
>     >     > >     > > > > real    10m55.796s
>     >     > >     > > > > user    399m33.567s
>     >     > >     > > > > sys     13m55.904s
>     >     > >     > > > > [23:28:04]
> ../src/io/iter_image_recordio_2.cc:172:
>     >     > >     > > > > ImageRecordIOParser2:
>     >     > >     > > > >
>     > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use
>     >     > 4
>     >     > >     > threads
>     >     > >     > > > > for decoding..
>     >     > >     > > > > [23:28:04]
> ../src/io/iter_image_recordio_2.cc:230:
>     > Load mean
>     >     > > image
>     >     > >     > > > > from
>     > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     >     > >     > > > > [23:28:04]
> ../src/io/iter_image_recordio_2.cc:248:
>     > Load mean
>     >     > > image
>     >     > >     > > > > from
>     > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     >     > > completed
>     >     > >     > > > > [23:28:04]
> ../src/io/iter_image_recordio_2.cc:172:
>     >     > >     > > > > ImageRecordIOParser2:
>     >     > >     > > > >
>     > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
>     >     > > threads
>     >     > >     > > > > for decoding..
>     >     > >     > > > > [23:28:04]
> ../src/io/iter_image_recordio_2.cc:230:
>     > Load mean
>     >     > > image
>     >     > >     > > > > from
>     > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     >     > >     > > > > [23:28:04]
> ../src/io/iter_image_recordio_2.cc:248:
>     > Load mean
>     >     > > image
>     >     > >     > > > > from
>     > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     >     > > completed
>     >     > >     > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001,
> 123:
>     > 0.0005,
>     >     > > 300:
>     >     > >     > > > > 0.0001} Epoch 0, Changed learning rate to 0.05
> Epoch
>     > 0, Batch
>     >     > > 199,
>     >     > >     > > > > Speed=419.039188 Epoch 0, Duration=143.934903
> Epoch 0,
>     >     > Training
>     >     > >     > > > > accuracy=0.122542 Epoch 0, Validation
> accuracy=0.164359
>     >     > Epoch 1,
>     >     > >     > Batch
>     >     > >     > > > > 199, Speed=445.257048 Epoch 1,
> Duration=135.248399
>     > Epoch 1,
>     >     > > Training
>     >     > >     > > > > accuracy=0.178828 Epoch 1, Validation
> accuracy=0.199419
>     >     > Epoch 2,
>     >     > >     > Batch
>     >     > >     > > > > 199, Speed=447.115215 Epoch 2,
> Duration=132.003770
>     > Epoch 2,
>     >     > > Training
>     >     > >     > > > > accuracy=0.217808 Epoch 2, Validation
> accuracy=0.233073
>     >     > Epoch 3,
>     >     > >     > Batch
>     >     > >     > > > > 199, Speed=441.079477 Epoch 3,
> Duration=126.543316
>     > Epoch 3,
>     >     > > Training
>     >     > >     > > > > accuracy=0.248102 Epoch 3, Validation
> accuracy=0.293870
>     >     > Epoch 4,
>     >     > >     > Batch
>     >     > >     > > > > 199, Speed=449.329787 Epoch 4,
> Duration=138.398325
>     > Epoch 4,
>     >     > > Training
>     >     > >     > > > > accuracy=0.270021 Epoch 4, Validation
> accuracy=0.311498
>     >     > >     > > > >
>     >     > >     > > > > real    11m45.329s
>     >     > >     > > > > user    426m13.908s
>     >     > >     > > > > sys     16m45.093s
>     >     > >     > > > >
>     >     > >     > > > > On Wed, Jun 26, 2019 at 4:18 PM Pedro Larroy
>     >     > >     > > > > <[email protected]> wrote:
>     >     > >     > > > > >
>     >     > >     > > > > > The difference looks smaller now, more like
> your
>     > numbers. I
>     >     > > wonder
>     >     > >     > > > > > if something happened during the previous
> benchmark
>     > like a
>     >     > > system
>     >     > >     > > > > > update...
>     >     > >     > > > > >
>     >     > >     > > > > >
>     >     > >     > > > > > piotr@ip-172-31-63-171
>     >     > :0:~/deeplearning-benchmark/dawnbench
>     >     > >     > > > > (master)+$
>     >     > >     > > > > > time ~/mxnet_1.4/py3_venv/bin/python cifar10.py
>     > --epochs 5
>     >     > &&
>     >     > > time
>     >     > >     > > > > > ~/mxnet_1.5/py3_venv/bin/python cifar10.py
> --epochs 5
>     >     > > [22:49:41]
>     >     > >     > > > > > ../src/io/iter_image_recordio_2.cc:172:
>     >     > >     > > > > > ImageRecordIOParser2:
>     >     > >     > > > > >
>     > /home/piotr/deeplearning-benchmark/data/cifar/train.rec,
>     >     > use 4
>     >     > >     > > > > > threads for decoding..
>     >     > >     > > > > > [22:49:41]
> ../src/io/iter_image_recordio_2.cc:230:
>     > Load
>     >     > mean
>     >     > > image
>     >     > >     > > > > > from
>     > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     >     > >     > > > > > [22:49:41]
> ../src/io/iter_image_recordio_2.cc:248:
>     > Load
>     >     > mean
>     >     > > image
>     >     > >     > > > > > from
>     > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     >     > >     > > > > completed
>     >     > >     > > > > > [22:49:41]
> ../src/io/iter_image_recordio_2.cc:172:
>     >     > >     > > > > > ImageRecordIOParser2:
>     >     > >     > > > > >
>     > /home/piotr/deeplearning-benchmark/data/cifar/test.rec,
>     >     > use 4
>     >     > >     > > > > > threads for decoding..
>     >     > >     > > > > > [22:49:41]
> ../src/io/iter_image_recordio_2.cc:230:
>     > Load
>     >     > mean
>     >     > > image
>     >     > >     > > > > > from
>     > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     >     > >     > > > > > [22:49:41]
> ../src/io/iter_image_recordio_2.cc:248:
>     > Load
>     >     > mean
>     >     > > image
>     >     > >     > > > > > from
>     > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     >     > >     > > > > completed
>     >     > >     > > > > > lr_schedule: {0: 0.05, 82:
> 0.005000000000000001, 123:
>     >     > 0.0005,
>     >     > > 300:
>     >     > >     > > > > > 0.0001} Epoch 0, Changed learning rate to 0.05
>     > [22:49:42]
>     >     > >     > > > > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
> Allocate
>     >     > >     > > > > > 147456 bytes with malloc directly
>     >     > >     > > > > > [22:49:42]
>     > ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
>     >     > > Allocate
>     >     > >     > > > > > 589824 bytes with malloc directly
>     >     > >     > > > > > [22:49:42]
>     > ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
>     >     > > Allocate
>     >     > >     > > > > > 2359296 bytes with malloc directly
>     >     > >     > > > > > [22:49:42]
>     > ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
>     >     > > Allocate
>     >     > >     > > > > > 9437184 bytes with malloc directly
>     >     > >     > > > > > Epoch 0, Batch 199, Speed=426.182733 Epoch 0,
>     >     > > Duration=134.868458
>     >     > >     > > > > > Epoch 0, Training accuracy=0.127238 Epoch 0,
>     > Validation
>     >     > >     > > > > > accuracy=0.206388 Epoch 1, Batch 199,
>     > Speed=313.127156
>     >     > Epoch
>     >     > > 1,
>     >     > >     > > > > > Duration=128.041775 Epoch 1, Training
>     > accuracy=0.182065
>     >     > Epoch
>     >     > > 1,
>     >     > >     > > > > > Validation accuracy=0.202524 Epoch 2, Batch
> 199,
>     >     > > Speed=410.931187
>     >     > >     > > > > > Epoch 2, Duration=124.920588 Epoch 2, Training
>     >     > > accuracy=0.202584
>     >     > >     > > > > > Epoch 2, Validation accuracy=0.245693 Epoch 3,
> Batch
>     > 199,
>     >     > >     > > > > > Speed=419.119335 Epoch 3, Duration=120.948349
> Epoch
>     > 3,
>     >     > > Training
>     >     > >     > > > > > accuracy=0.235854 Epoch 3, Validation
>     > accuracy=0.291066
>     >     > Epoch
>     >     > > 4,
>     >     > >     > > > > > Batch 199, Speed=430.473733 Epoch 4,
>     > Duration=130.181724
>     >     > > Epoch 4,
>     >     > >     > > > > > Training accuracy=0.257773 Epoch 4, Validation
>     >     > > accuracy=0.304988
>     >     > >     > > > > >
>     >     > >     > > > > > real    11m7.356s
>     >     > >     > > > > > user    406m9.910s
>     >     > >     > > > > > sys     14m18.349s
>     >     > >     > > > > > [23:00:49]
> ../src/io/iter_image_recordio_2.cc:172:
>     >     > >     > > > > > ImageRecordIOParser2:
>     >     > >     > > > > >
>     > /home/piotr/deeplearning-benchmark/data/cifar/train.rec,
>     >     > use 4
>     >     > >     > > > > > threads for decoding..
>     >     > >     > > > > > [23:00:49]
> ../src/io/iter_image_recordio_2.cc:230:
>     > Load
>     >     > mean
>     >     > > image
>     >     > >     > > > > > from
>     > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     >     > >     > > > > > [23:00:49]
> ../src/io/iter_image_recordio_2.cc:248:
>     > Load
>     >     > mean
>     >     > > image
>     >     > >     > > > > > from
>     > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     >     > >     > > > > completed
>     >     > >     > > > > > [23:00:49]
> ../src/io/iter_image_recordio_2.cc:172:
>     >     > >     > > > > > ImageRecordIOParser2:
>     >     > >     > > > > >
>     > /home/piotr/deeplearning-benchmark/data/cifar/test.rec,
>     >     > use 4
>     >     > >     > > > > > threads for decoding..
>     >     > >     > > > > > [23:00:49]
> ../src/io/iter_image_recordio_2.cc:230:
>     > Load
>     >     > mean
>     >     > > image
>     >     > >     > > > > > from
>     > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     >     > >     > > > > > [23:00:49]
> ../src/io/iter_image_recordio_2.cc:248:
>     > Load
>     >     > mean
>     >     > > image
>     >     > >     > > > > > from
>     > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     >     > >     > > > > completed
>     >     > >     > > > > > lr_schedule: {0: 0.05, 82:
> 0.005000000000000001, 123:
>     >     > 0.0005,
>     >     > > 300:
>     >     > >     > > > > > 0.0001} Epoch 0, Changed learning rate to 0.05
> Epoch
>     > 0,
>     >     > Batch
>     >     > > 199,
>     >     > >     > > > > > Speed=348.618154 Epoch 0, Duration=146.469352
> Epoch
>     > 0,
>     >     > > Training
>     >     > >     > > > > > accuracy=0.124121 Epoch 0, Validation
>     > accuracy=0.167227
>     >     > Epoch
>     >     > > 1,
>     >     > >     > > > > > Batch 199, Speed=452.790825 Epoch 1,
>     > Duration=130.199421
>     >     > > Epoch 1,
>     >     > >     > > > > > Training
>     >     > >     > > > > > accuracy=0.183863 Epoch 1, Validation
>     > accuracy=0.237079
>     >     > Epoch
>     >     > > 2,
>     >     > >     > > > > > Batch 199, Speed=451.406559 Epoch 2,
>     > Duration=126.320823
>     >     > > Epoch 2,
>     >     > >     > > > > > Training
>     >     > >     > > > > > accuracy=0.214844 Epoch 2, Validation
>     > accuracy=0.244692
>     >     > Epoch
>     >     > > 3,
>     >     > >     > > > > > Batch 199, Speed=403.161873 Epoch 3,
>     > Duration=125.331660
>     >     > > Epoch 3,
>     >     > >     > > > > > Training
>     >     > >     > > > > > accuracy=0.243506 Epoch 3, Validation
>     > accuracy=0.301182
>     >     > Epoch
>     >     > > 4,
>     >     > >     > > > > > Batch 199, Speed=450.826598 Epoch 4,
>     > Duration=126.426253
>     >     > > Epoch 4,
>     >     > >     > > > > > Training
>     >     > >     > > > > > accuracy=0.266424 Epoch 4, Validation
>     > accuracy=0.311899
>     >     > >     > > > > >
>     >     > >     > > > > > real    11m21.930s
>     >     > >     > > > > > user    415m3.855s
>     >     > >     > > > > > sys     13m53.975s
>     >     > >     > > > > >
>     >     > >     > > > > > On Wed, Jun 26, 2019 at 3:50 PM Pedro Larroy
>     >     > >     > > > > > <[email protected]> wrote:
>     >     > >     > > > > > >
>     >     > >     > > > > > > Hi Ciyong, thanks for trying to reproduce:
>     >     > >     > > > > > >
>     >     > >     > > > > > > I used this one:
>     >     > >     > > > > > > https://github.com/awslabs/deeplearning-
>     >     > >     > > > > benchmark/blob/master/dawnbe
>     >     > >     > > > > > > nch/cifar10.py
>     >     > >     > > > > > >
>     >     > >     > > > > > > Could you provide hardware and OS details?
>     >     > >     > > > > > >
>     >     > >     > > > > > > I will rerun and repost numbers in a few
> minutes.
>     >     > >     > > > > > >
>     >     > >     > > > > > > Pedro.
>     >     > >     > > > > > >
>     >     > >     > > > > > > On Wed, Jun 26, 2019 at 4:18 AM Chen, Ciyong
>     >     > >     > > > > > > <[email protected]>
>     >     > >     > > > > wrote:
>     >     > >     > > > > > > >
>     >     > >     > > > > > > > Hi Pedro,
>     >     > >     > > > > > > >
>     >     > >     > > > > > > > I'm looking at this case, and using the
> script of
>     >     > >     > > > > > > >
>     >     > >
> "incubator-mxnet/example/image-classification/train_cifar10.py"
>     >     > >     > > > > > > > to get
>     >     > >     > > > > the timing data, but seems there's not much
> difference
>     >     > between
>     >     > > mxnet
>     >     > >     > > > > 1.4.1.rc0 and 1.5.0.rc1 on C5.18xlarge.
>     >     > >     > > > > > > >
>     >     > >     > > > > > > > Not sure if there's any difference in the
> python
>     >     > script,
>     >     > > can
>     >     > >     > you
>     >     > >     > > > > > > > point me
>     >     > >     > > > > the link to get your script (cifar10.py)?
>     >     > >     > > > > > > > Or you can also have a try with MXNet's
> script
>     >     > >     > > > > > > > (train_cifar10.py) and see
>     >     > >     > > > > the performance.
>     >     > >     > > > > > > >
>     >     > >     > > > > > > > Here's the command I used to collect the
> time:
>     >     > >     > > > > > > >         python train_cifar10.py
> --num-epoch=5
>     >     > >     > > > > > > >
>     >     > >     > > > > > > > 1) 1.5.0.rc1
>     > (4d9667121ae6fb643f2a02ab15e25231ed756cde)
>     >     > >     > > > > > > >         real    9m4.880s
>     >     > >     > > > > > > >         user    333m13.340s
>     >     > >     > > > > > > >         sys     14m36.100s
>     >     > >     > > > > > > >
>     >     > >     > > > > > > > 2) 1.4.1.rc0
>     > (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
>     >     > >     > > > > > > >         real    9m2.155s
>     >     > >     > > > > > > >         user    329m37.092s
>     >     > >     > > > > > > >         sys     16m8.668s
>     >     > >     > > > > > > >
>     >     > >     > > > > > > > -Ciyong
>     >     > >     > > > > > > >
>     >     > >     > > > > > > >
>     >     > >     > > > > > > > -----Original Message-----
>     >     > >     > > > > > > > From: Pedro Larroy [mailto:
>     >     > [email protected]]
>     >     > >     > > > > > > > Sent: Wednesday, June 26, 2019 6:28 AM
>     >     > >     > > > > > > > To: [email protected]
>     >     > >     > > > > > > > Cc: [email protected]
>     >     > >     > > > > > > > Subject: Re: [VOTE] Release Apache MXNet
>     > (incubating)
>     >     > > version
>     >     > >     > > > > > > > 1.5.0.rc1
>     >     > >     > > > > > > >
>     >     > >     > > > > > > > Hi these were my build flags and system
> info:
>     >     > >     > > > > > > >
>     >     > >     > > > > > > >
>     >     > >     > > > > > > > --- # CMake configuration
>     >     > >     > > > > > > > USE_CUDA: "OFF" # Build with CUDA support
>     >     > >     > > > > > > > USE_OLDCMAKECUDA: "OFF" # Build with old
> cmake
>     > cuda
>     >     > >     > > > > > > > USE_NCCL: "OFF" # Use NVidia NCCL with CUDA
>     >     > >     > > > > > > > USE_OPENCV: "ON" # Build with OpenCV
> support
>     >     > >     > > > > > > > USE_OPENMP: "ON" # Build with Openmp
> support
>     >     > >     > > > > > > > USE_CUDNN: "ON" # Build with cudnn
> support) #
>     > one could
>     >     > > set
>     >     > >     > > > > > > > CUDNN_ROOT for search path
>     >     > >     > > > > > > > USE_SSE: "ON" # Build with x86 SSE
> instruction
>     > support
>     >     > IF
>     >     > > NOT
>     >     > >     > > > > > > > ARM
>     >     > >     > > > > > > > USE_F16C: "ON" # Build with x86 F16C
> instruction
>     >     > support)
>     >     > > #
>     >     > >     > > > > autodetects support if "ON"
>     >     > >     > > > > > > > USE_LAPACK: "ON" # Build with lapack
> support
>     >     > >     > > > > > > > USE_MKL_IF_AVAILABLE: "ON" # Use MKL if
> found
>     >     > >     > > > > > > > USE_MKLML_MKL: "ON" # Use MKLDNN variant
> of MKL
>     > (if MKL
>     >     > > found)
>     >     > >     > > > > > > > IF USE_MKL_IF_AVAILABLE AND (NOT APPLE)
>     >     > >     > > > > > > > USE_MKLDNN: "ON" # Use MKLDNN variant of
> MKL (if
>     > MKL
>     >     > > found) IF
>     >     > >     > > > > > > > USE_MKL_IF_AVAILABLE AND (NOT APPLE)
>     >     > >     > > > > > > > USE_OPERATOR_TUNING: "ON" # Enable
> auto-tuning of
>     >     > > operators IF
>     >     > >     > > > > NOT
>     >     > >     > > > > > > > MSVC
>     >     > >     > > > > > > > USE_GPERFTOOLS: "ON" # Build with
> GPerfTools
>     > support
>     >     > (if
>     >     > > found)
>     >     > >     > > > > > > > USE_JEMALLOC: "ON" # Build with Jemalloc
> support
>     >     > >     > > > > > > > USE_PROFILER: "ON" # Build with Profiler
> support
>     >     > >     > > > > > > > USE_DIST_KVSTORE: "OFF" # Build with
> DIST_KVSTORE
>     >     > support
>     >     > >     > > > > > > > USE_PLUGINS_WARPCTC: "OFF" # Use WARPCTC
> Plugins
>     >     > >     > > > > > > > USE_PLUGIN_CAFFE: "OFF" # Use Caffe Plugin
>     >     > >     > > > > > > > USE_CPP_PACKAGE: "OFF" # Build C++ Package
>     >     > >     > > > > > > > USE_MXNET_LIB_NAMING: "ON" # Use MXNet
> library
>     > naming
>     >     > >     > > > > conventions.
>     >     > >     > > > > > > > USE_GPROF: "OFF" # Compile with gprof
>     > (profiling) flag
>     >     > >     > > > > > > > USE_CXX14_IF_AVAILABLE: "OFF" # Build with
> C++14
>     > if the
>     >     > >     > compiler
>     >     > >     > > > > > > > supports it
>     >     > >     > > > > > > > USE_VTUNE: "OFF" # Enable use of Intel
> Amplifier
>     > XE
>     >     > > (VTune)) #
>     >     > >     > > > > > > > one could set VTUNE_ROOT for search path
>     >     > >     > > > > > > > ENABLE_CUDA_RTC: "ON" # Build with CUDA
> runtime
>     >     > > compilation
>     >     > >     > > > > > > > support
>     >     > >     > > > > > > > BUILD_CPP_EXAMPLES: "ON" # Build cpp
> examples
>     >     > >     > > > > > > > INSTALL_EXAMPLES: "OFF" # Install the
> example
>     > source
>     >     > > files.
>     >     > >     > > > > > > > USE_SIGNAL_HANDLER: "ON" # Print stack
> traces on
>     >     > > segfaults.
>     >     > >     > > > > > > > USE_TENSORRT: "OFF" # Enable infeference
>     > optimization
>     >     > with
>     >     > >     > > > TensorRT.
>     >     > >     > > > > > > > USE_ASAN: "OFF" # Enable Clang/GCC ASAN
>     > sanitizers.
>     >     > >     > > > > > > > ENABLE_TESTCOVERAGE: "OFF" # Enable
> compilation
>     > with
>     >     > test
>     >     > >     > > > > > > > coverage metric output
>     >     > >     > > > > > > > CMAKE_BUILD_TYPE: "Release"
>     >     > >     > > > > > > > CMAKE_CUDA_COMPILER_LAUNCHER: "ccache"
>     >     > >     > > > > > > > CMAKE_C_COMPILER_LAUNCHER: "ccache"
>     >     > >     > > > > > > > CMAKE_CXX_COMPILER_LAUNCHER: "ccache"
>     >     > >     > > > > > > >
>     >     > >     > > > > > > > commit
> 4d9667121ae6fb643f2a02ab15e25231ed756cde
>     > (HEAD,
>     >     > > tag:
>     >     > >     > > > > > > > 1.5.0.rc1,
>     >     > >     > > > > > > > upstream/v1.5.x)
>     >     > >     > > > > > > > commit
> 1a7199691f5cbc6012bb53eecbf884bed5ae6590
>     > (HEAD,
>     >     > > tag:
>     >     > >     > > > > > > > 1.4.1.rc0,
>     >     > >     > > > > > > > upstream/v1.4.x)
>     >     > >     > > > > > > >
>     >     > >     > > > > > > > curl
>     >     > http://169.254.169.254/latest/meta-data/instance-type
>     >     > >     > > > > > > > c5d.18xlarge
>     >     > >     > > > > > > >
>     >     > >     > > > > > > >
>     >     > >     > > > > > > > Version      : 3.6.7
>     >     > >     > > > > > > > Compiler     : GCC 8.2.0
>     >     > >     > > > > > > > Build        : ('default', 'Oct 22 2018
>     > 11:32:17')
>     >     > >     > > > > > > > Arch         : ('64bit', 'ELF')
>     >     > >     > > > > > > > ------------Pip Info-----------
>     >     > >     > > > > > > > Version      : 19.1.1
>     >     > >     > > > > > > > Directory    :
>     >     > >     > /home/piotr/mxnet_1.5/py3_venv/lib/python3.6/site-
>     >     > >     > > > > packages/pip
>     >     > >     > > > > > > > ----------MXNet Info-----------
>     >     > >     > > > > > > > Version      : 1.5.0
>     >     > >     > > > > > > > Directory    :
> /home/piotr/mxnet_1.5/python/mxnet
>     >     > >     > > > > > > > Hashtag not found. Not installed from
> pre-built
>     >     > package.
>     >     > >     > > > > > > > ----------System Info----------
>     >     > >     > > > > > > > Platform     :
>     >     > >     > > >
> Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
>     >     > >     > > > > > > > system       : Linux
>     >     > >     > > > > > > > node         : ip-172-31-63-171
>     >     > >     > > > > > > > release      : 4.15.0-1035-aws
>     >     > >     > > > > > > > version      : #37-Ubuntu SMP Mon Mar 18
>     > 16:15:14 UTC
>     >     > 2019
>     >     > >     > > > > > > > ----------Hardware Info----------
>     >     > >     > > > > > > > machine      : x86_64
>     >     > >     > > > > > > > processor    : x86_64
>     >     > >     > > > > > > > Architecture:        x86_64
>     >     > >     > > > > > > > CPU op-mode(s):      32-bit, 64-bit
>     >     > >     > > > > > > > Byte Order:          Little Endian
>     >     > >     > > > > > > > CPU(s):              72
>     >     > >     > > > > > > > On-line CPU(s) list: 0-71
>     >     > >     > > > > > > > Thread(s) per core:  2
>     >     > >     > > > > > > > Core(s) per socket:  18
>     >     > >     > > > > > > > Socket(s):           2
>     >     > >     > > > > > > > NUMA node(s):        2
>     >     > >     > > > > > > > Vendor ID:           GenuineIntel
>     >     > >     > > > > > > > CPU family:          6
>     >     > >     > > > > > > > Model:               85
>     >     > >     > > > > > > > Model name:          Intel(R) Xeon(R)
> Platinum
>     > 8124M
>     >     > CPU @
>     >     > >     > 3.00GHz
>     >     > >     > > > > > > > Stepping:            4
>     >     > >     > > > > > > > CPU MHz:             1326.446
>     >     > >     > > > > > > > BogoMIPS:            6000.00
>     >     > >     > > > > > > > Hypervisor vendor:   KVM
>     >     > >     > > > > > > > Virtualization type: full
>     >     > >     > > > > > > > L1d cache:           32K
>     >     > >     > > > > > > > L1i cache:           32K
>     >     > >     > > > > > > > L2 cache:            1024K
>     >     > >     > > > > > > > L3 cache:            25344K
>     >     > >     > > > > > > > NUMA node0 CPU(s):   0-17,36-53
>     >     > >     > > > > > > > NUMA node1 CPU(s):   18-35,54-71
>     >     > >     > > > > > > > Flags:               fpu vme de pse tsc
> msr pae
>     > mce cx8
>     >     > > apic
>     >     > >     > sep
>     >     > >     > > > mtrr
>     >     > >     > > > > > > > pge mca cmov pat pse36 clflush mmx fxsr
> sse sse2
>     > ss ht
>     >     > > syscall
>     >     > >     > > > > > > > nx pdpe1gb rdtscp lm constant_tsc
> arch_perfmon
>     > rep_good
>     >     > > nopl
>     >     > >     > > > > > > > xtopology nonstop_tsc cpuid aperfmperf pni
>     > pclmulqdq
>     >     > > monitor
>     >     > >     > > > > > > > ssse3 fma cx16 pcid
>     >     > >     > > > > > > > sse4_1 sse4_2 x2apic movbe popcnt
>     > tsc_deadline_timer
>     >     > aes
>     >     > > xsave
>     >     > >     > > > > > > > avx f16c rdrand hypervisor lahf_lm abm
>     > 3dnowprefetch
>     >     > >     > > > > > > > invpcid_single pti fsgsbase tsc_adjust
> bmi1 hle
>     > avx2
>     >     > smep
>     >     > > bmi2
>     >     > >     > > > > > > > erms invpcid rtm mpx avx512f avx512dq
> rdseed adx
>     > smap
>     >     > >     > clflushopt
>     >     > >     > > > > > > > clwb avx512cd avx512bw avx512vl xsaveopt
> xsavec
>     > xgetbv1
>     >     > > xsaves
>     >     > >     > > > > > > > ida arat pku ospke ----------Network
>     > Test----------
>     >     > >     > > > > > > >
>     >     > >     > > > > > > > ----------Python Info----------
>     >     > >     > > > > > > > Version      : 3.6.7
>     >     > >     > > > > > > > Compiler     : GCC 8.2.0
>     >     > >     > > > > > > > Build        : ('default', 'Oct 22 2018
>     > 11:32:17')
>     >     > >     > > > > > > > Arch         : ('64bit', 'ELF')
>     >     > >     > > > > > > > ------------Pip Info-----------
>     >     > >     > > > > > > > Version      : 19.1.1
>     >     > >     > > > > > > > Directory    :
>     >     > >     > /home/piotr/mxnet_1.4/py3_venv/lib/python3.6/site-
>     >     > >     > > > > packages/pip
>     >     > >     > > > > > > > ----------MXNet Info-----------
>     >     > >     > > > > > > > Version      : 1.4.1
>     >     > >     > > > > > > > Directory    :
> /home/piotr/mxnet_1.4/python/mxnet
>     >     > >     > > > > > > > Hashtag not found. Not installed from
> pre-built
>     >     > package.
>     >     > >     > > > > > > > ----------System Info----------
>     >     > >     > > > > > > > Platform     :
>     >     > >     > > >
> Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
>     >     > >     > > > > > > > system       : Linux
>     >     > >     > > > > > > > node         : ip-172-31-63-171
>     >     > >     > > > > > > > release      : 4.15.0-1035-aws
>     >     > >     > > > > > > > version      : #37-Ubuntu SMP Mon Mar 18
>     > 16:15:14 UTC
>     >     > 2019
>     >     > >     > > > > > > > ----------Hardware Info----------
>     >     > >     > > > > > > > machine      : x86_64
>     >     > >     > > > > > > > processor    : x86_64
>     >     > >     > > > > > > > Architecture:        x86_64
>     >     > >     > > > > > > > CPU op-mode(s):      32-bit, 64-bit
>     >     > >     > > > > > > > Byte Order:          Little Endian
>     >     > >     > > > > > > > CPU(s):              72
>     >     > >     > > > > > > > On-line CPU(s) list: 0-71
>     >     > >     > > > > > > > Thread(s) per core:  2
>     >     > >     > > > > > > > Core(s) per socket:  18
>     >     > >     > > > > > > > Socket(s):           2
>     >     > >     > > > > > > > NUMA node(s):        2
>     >     > >     > > > > > > > Vendor ID:           GenuineIntel
>     >     > >     > > > > > > > CPU family:          6
>     >     > >     > > > > > > > Model:               85
>     >     > >     > > > > > > > Model name:          Intel(R) Xeon(R)
> Platinum
>     > 8124M
>     >     > CPU @
>     >     > >     > 3.00GHz
>     >     > >     > > > > > > > Stepping:            4
>     >     > >     > > > > > > > CPU MHz:             1223.344
>     >     > >     > > > > > > > BogoMIPS:            6000.00
>     >     > >     > > > > > > > Hypervisor vendor:   KVM
>     >     > >     > > > > > > > Virtualization type: full
>     >     > >     > > > > > > > L1d cache:           32K
>     >     > >     > > > > > > > L1i cache:           32K
>     >     > >     > > > > > > > L2 cache:            1024K
>     >     > >     > > > > > > > L3 cache:            25344K
>     >     > >     > > > > > > > NUMA node0 CPU(s):   0-17,36-53
>     >     > >     > > > > > > > NUMA node1 CPU(s):   18-35,54-71
>     >     > >     > > > > > > > Flags:               fpu vme de pse tsc
> msr pae
>     > mce cx8
>     >     > > apic
>     >     > >     > sep
>     >     > >     > > > mtrr
>     >     > >     > > > > > > > pge mca cmov pat pse36 clflush mmx fxsr
> sse sse2
>     > ss ht
>     >     > > syscall
>     >     > >     > > > > > > > nx pdpe1gb rdtscp lm constant_tsc
> arch_perfmon
>     > rep_good
>     >     > > nopl
>     >     > >     > > > > > > > xtopology nonstop_tsc cpuid aperfmperf pni
>     > pclmulqdq
>     >     > > monitor
>     >     > >     > > > > > > > ssse3 fma cx16 pcid
>     >     > >     > > > > > > > sse4_1 sse4_2 x2apic movbe popcnt
>     > tsc_deadline_timer
>     >     > aes
>     >     > > xsave
>     >     > >     > > > > > > > avx f16c rdrand hypervisor lahf_lm abm
>     > 3dnowprefetch
>     >     > >     > > > > > > > invpcid_single pti fsgsbase tsc_adjust
> bmi1 hle
>     > avx2
>     >     > smep
>     >     > > bmi2
>     >     > >     > > > > > > > erms invpcid rtm mpx avx512f avx512dq
> rdseed adx
>     > smap
>     >     > >     > clflushopt
>     >     > >     > > > > > > > clwb avx512cd avx512bw avx512vl xsaveopt
> xsavec
>     > xgetbv1
>     >     > > xsaves
>     >     > >     > > > > > > > ida arat pku ospke ----------Network
>     > Test----------
>     >     > >     > > > > > > >
>     >     > >     > > > > > > > On Tue, Jun 25, 2019 at 2:35 PM Pedro
> Larroy
>     >     > >     > > > > <[email protected]> wrote:
>     >     > >     > > > > > > > >
>     >     > >     > > > > > > > > I did a training of cifar10 in CPU and
> seems
>     > there's
>     >     > > some
>     >     > >     > > > > > > > > regressions in the range of 7% increase
> of
>     > training
>     >     > time
>     >     > >     > against
>     >     > >     > > > 1.4.1:
>     >     > >     > > > > > > > >
>     >     > >     > > > > > > > > (py3_venv)
>     >     > >     > > > > > > > > piotr@ip-172-31-63-171
>     >     > > :0:~/deeplearning-benchmark/dawnbench
>     >     > >     > > > > > > > > (master)+$ time python cifar10.py
> --epochs 5
>     >     > >     > > > > > > > > real    11m30.388s
>     >     > >     > > > > > > > > user    417m7.766s
>     >     > >     > > > > > > > > sys     16m57.315s
>     >     > >     > > > > > > > >
>     >     > >     > > > > > > > > VS 1.4.1:
>     >     > >     > > > > > > > > real    10m41.994s
>     >     > >     > > > > > > > > user    392m40.646s
>     >     > >     > > > > > > > > sys     12m30.601s
>     >     > >     > > > > > > > >
>     >     > >     > > > > > > > >
>     >     > >     > > > > > > > > On Thu, Jun 20, 2019 at 10:15 PM Lai Wei
> <
>     >     > >     > [email protected]>
>     >     > >     > > > > wrote:
>     >     > >     > > > > > > > > >
>     >     > >     > > > > > > > > > Hi Anirudh,
>     >     > >     > > > > > > > > >
>     >     > >     > > > > > > > > > Thanks for jumping into this quickly, I
>     > followed up
>     >     > > on the
>     >     > >     > > > issue.
>     >     > >     > > > > > > > > >
>     >     > >     > > > > > > > > > I was meant for sockeye
>     > developer/maintainers to
>     >     > help
>     >     > > setup
>     >     > >     > > > > > > > > > nightly tests and raise issues early.
>     >     > >     > > > > > > > > >
>     >     > >     > > > > > > > > > Thanks!
>     >     > >     > > > > > > > > >
>     >     > >     > > > > > > > > > On Fri, Jun 21, 2019 at 10:10 AM
> Haibin Lin
>     >     > >     > > > > > > > > > <[email protected]>
>     >     > >     > > > > > > > > > wrote:
>     >     > >     > > > > > > > > >
>     >     > >     > > > > > > > > > > In GluonNLP we are testing with MXNET
>     > nightly
>     >     > build
>     >     > > for
>     >     > >     > > > > > > > > > > each PR, and we did find some MXNet
>     > related issue
>     >     > > caught
>     >     > >     > by
>     >     > >     > > > the CI.
>     >     > >     > > > > > > > > > > I recommend other toolkits also add
>     > integration
>     >     > > tests
>     >     > >     > with
>     >     > >     > > > > > > > > > > MXNet
>     >     > >     > > > > nightly.
>     >     > >     > > > > > > > > > > It helps identify issues early.
>     >     > >     > > > > > > > > > >
>     >     > >     > > > > > > > > > > Best,
>     >     > >     > > > > > > > > > > Haibin
>     >     > >     > > > > > > > > > >
>     >     > >     > > > > > > > > > > On Thu, Jun 20, 2019 at 18:52 Zhao,
> Patric
>     >     > >     > > > > > > > > > > <[email protected]>
>     >     > >     > > > > wrote:
>     >     > >     > > > > > > > > > >
>     >     > >     > > > > > > > > > > > Thanks to raise the issue and we
> will
>     > take a
>     >     > look
>     >     > > ASAP.
>     >     > >     > > > > > > > > > > >
>     >     > >     > > > > > > > > > > > The downstream cases is not in the
> MXNet
>     > CI so
>     >     > > it's
>     >     > >     > hard
>     >     > >     > > > > > > > > > > > to catch the potential bugs or
>     > performance
>     >     > > degradation
>     >     > >     > > > > > > > > > > > for
>     >     > >     > > > > MXNet developers.
>     >     > >     > > > > > > > > > > >
>     >     > >     > > > > > > > > > > > In the future, I suggest adding
> the major
>     >     > > downstream
>     >     > >     > > > > > > > > > > > test cases, like
>     >     > >     > > > > > > > > > > from
>     >     > >     > > > > > > > > > > > sockeye, GluonNLP, GLuonCV, DGL,
>     > Gluon-TS, into
>     >     > > the
>     >     > >     > > > > > > > > > > > nightly
>     >     > >     > > > > test.
>     >     > >     > > > > > > > > > > > If it's still too heavy,  maybe
> testing
>     > it
>     >     > weekly
>     >     > > or
>     >     > >     > > > > > > > > > > > monthly :)
>     >     > >     > > > > > > > > > > >
>     >     > >     > > > > > > > > > > > Thanks,
>     >     > >     > > > > > > > > > > >
>     >     > >     > > > > > > > > > > > --Patric
>     >     > >     > > > > > > > > > > >
>     >     > >     > > > > > > > > > > > > -----Original Message-----
>     >     > >     > > > > > > > > > > > > From: Anirudh Subramanian
>     >     > >     > > > > > > > > > > > > [mailto:[email protected]]
>     >     > >     > > > > > > > > > > > > Sent: Friday, June 21, 2019 9:31
> AM
>     >     > >     > > > > > > > > > > > > To:
> [email protected]
>     >     > >     > > > > > > > > > > > > Cc: [email protected]
>     >     > >     > > > > > > > > > > > > Subject: Re: [VOTE] Release
> Apache
>     > MXNet
>     >     > > (incubating)
>     >     > >     > > > > > > > > > > > > version
>     >     > >     > > > > > > > > > > > > 1.5.0.rc1
>     >     > >     > > > > > > > > > > > >
>     >     > >     > > > > > > > > > > > > Hi Lai,
>     >     > >     > > > > > > > > > > > >
>     >     > >     > > > > > > > > > > > > I have opened an issue:
>     >     > >     > > > > > > > > > > > >
>     >     > >     > https://github.com/apache/incubator-mxnet/issues/15297
>     >     > >     > > > > > > > > > > > > I came to know about this issue
> only
>     > today
>     >     > and
>     >     > > I have
>     >     > >     > > > > > > > > > > > > not been
>     >     > >     > > > > > > > > > > monitoring
>     >     > >     > > > > > > > > > > > > sockeye.
>     >     > >     > > > > > > > > > > > > I jumped onto this issue to make
> sure
>     > it
>     >     > wasn't
>     >     > >     > caused
>     >     > >     > > > > > > > > > > > > by the dlpack
>     >     > >     > > > > > > > > > > > changes.
>     >     > >     > > > > > > > > > > > > Also, I don't  think sockeye CI
> checks
>     >     > against
>     >     > >     > master,
>     >     > >     > > > > > > > > > > > > it is using
>     >     > >     > > > > > > > > > > 1.4.1.
>     >     > >     > > > > > > > > > > > >
>     >     > >     > > > > > > > > > > > > Anirudh
>     >     > >     > > > > > > > > > > > >
>     >     > >     > > > > > > > > > > > >
>     >     > >     > > > > > > > > > > > > On Thu, Jun 20, 2019 at 6:17 PM
> Lai Wei
>     >     > >     > > > > > > > > > > > > <[email protected]>
>     >     > >     > > > > wrote:
>     >     > >     > > > > > > > > > > > >
>     >     > >     > > > > > > > > > > > > > Hi,
>     >     > >     > > > > > > > > > > > > >
>     >     > >     > > > > > > > > > > > > > Could you share which test
> failed and
>     >     > what’s
>     >     > > the
>     >     > >     > > > > > > > > > > > > > crash? How to reproduce it?
>     >     > >     > > > > > > > > > > > > >
>     >     > >     > > > > > > > > > > > > > I was able to install sockeye
> and
>     > run all
>     >     > > tests
>     >     > >     > passed.
>     >     > >     > > > > > > > > > > > > > Using python setup.py test
>     >     > >     > > > > > > > > > > > > >
>     >     > >     > > > > > > > > > > > > > I have tested both nightly pip
>     > package and
>     >     > >     > 1.5.0.rc1
>     >     > >     > > > > > > > > > > > > >
>     >     > >     > > > > > > > > > > > > > It would be great to create an
> issue
>     > with
>     >     > >     > > > > > > > > > > > > > reproducible steps and move the
>     > discussion
>     >     > > there.
>     >     > >     > > > > > > > > > > > > >
>     >     > >     > > > > > > > > > > > > > Also I see sockeye nightly
> build[1]
>     > has
>     >     > been
>     >     > >     > failing
>     >     > >     > > > > > > > > > > > > > for some time,
>     >     > >     > > > > > > > > > > if
>     >     > >     > > > > > > > > > > > > > it’s due to MXNet change,
> please
>     > raise this
>     >     > > early
>     >     > >     > so
>     >     > >     > > > > > > > > > > > > > we can track and solve it in
> time
>     > rather
>     >     > than
>     >     > > block
>     >     > >     > > > > > > > > > > > > > the release
>     >     > >     > > > > during vote time.
>     >     > >     > > > > > > > > > > > > >
>     >     > >     > > > > > > > > > > > > > [1]
>     > https://travis-ci.org/awslabs/sockeye
>     >     > >     > > > > > > > > > > > > >
>     >     > >     > > > > > > > > > > > > >
>     >     > >     > > > > > > > > > > > > > On Fri, Jun 21, 2019 at 7:01 AM
>     > Anirudh
>     >     > > Subramanian
>     >     > >     > > > > > > > > > > > >
>
>
>
>

-- 
Sandeep Krishnamurthy

Reply via email to