I ran operator benchmarks for GPU with CUDA10.1 comparing MXNet 1.4.1 and 1.5.0 RC2 Results - https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50#file-mxnet_opperf_gpu-md
GPU operator benchmark Summary: 1. Most ops are mostly stable across MXNet 1.4.1 and 1.5.0.RC2. No regressions. 2. Nice improvements on Dropout and FC backward. As shared before on CPU - https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50#file-mxnet_opperf_cpu-md, individual operator performance, comparing MXNet 1.4.1 and 1.5.0.RC2: CPU operator benchmark Summary: 1. Many operators have improved very well - dropout, Convolution, reduction ops like max, mean, min, prod, sum 2. Many operators have regressed - batchnorm, pooling, batch_dot with transpose, all broadcast_* ops Best, Sandeep On Fri, Jun 28, 2019 at 10:46 AM Davydenko, Denis < [email protected]> wrote: > Just to re-iterate, postponing the release until we have a strong hold on > perf regression is my #1 choice as well. I am just trying to consider > alternatives where we can release 1.5.0 and manage potential perf impact... > > On 6/28/19, 10:04 AM, "Marco de Abreu" <[email protected]> wrote: > > Hey Denis, > > I don't think something like an experimental release is something that > the > Apache release process supports. Also, I would be afraid of automated > systems consuming MXNet by simply fetching the latest release version. > These users would then get the experimental version without being > aware. > > For the sake of the best user experience, I'd prefer if we could take > a few > days to track down the root causes for all these regressions. While I > agree > that releasing the new features and optimizations is certainly > overdue, I > think that the most important point is to keep up with the existing > users > and their trust. If a new release performs worse for the same kind of > workload, they might lose trust into our release process and in future > might be less willing to adopt a new release early-on. > > -Marco > > Davydenko, Denis <[email protected]> schrieb am Fr., 28. > Juni > 2019, 18:55: > > > According to Sandeep's evaluation of perf regression on operator > level [1] > > we have 77 op/input combinations for forward pass and 50 for > backward pass > > where regression is 5%+ (biggest regressions observed are about 86% > and 84% > > respectively) out of 290 tests. If I raise threshold of degradation > to 10%+ > > corresponding numbers are 70 for forward and 42 for backward. This, > from my > > perspective, constitutes significant scale performance impact, at > least on > > individual operator level. In light of keeping every next release as > > performant as previous (at least to feasible extent) I suggest we > can only > > move forward with 1.5.0 release if we call it experimental. Current > > landscape of operators having potentially negative performance > impact on > > customers could (and I consider it will) put MXNet one step behind > its > > current market position of being a choice for performance optimized > DL > > workloads. Tagging it as experimental, from my point of view, would > help to > > release new features so that customers could enjoy them while being > > explicit about performance optimizations going on. > > > > [1] > > > https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50 > > > > > > > > On 6/28/19, 9:38 AM, "Lai Wei" <[email protected]> wrote: > > > > Hi, > > > > Some more data points: > > > > I ran the same cifar10.py scripts with same setup, BUT added a > fixed > > seed > > > > Ran 50 epochs, and first 10 epoch as warmup. > > I have the following average time per epoch: > > 1.4.1: 164.95 s > > 1.5.0: 170.44 s > > Detailed data at [1] > > This is about 3% regression, less than Manu’s result but more > close to > > the > > Gluon result. > > > > As for the operator benchmarks from Sandeep[2], I have > calculated the > > percentage of speed increase/regression here[1]. Looks like not > all > > operators mentioned before slowed down. should it be treated as > an > > separate > > issue as it’s testing on fake data with different shape than > CIFAR10 > > dataset? For example, batch norm has no regression in the report > but > > it’s > > slowed down in cifar10.py script profiling. > > > > [1] > https://gist.github.com/roywei/41fce930f013ff3b54cda6e86eaaf66b > > [2] > > > > > https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50 > > > > > > On Fri, Jun 28, 2019 at 2:47 PM Pedro Larroy < > > [email protected]> > > wrote: > > > > > Thanks Manu. > > > > > > @all: I observed other strange stuff that I don't understand > at the > > moment: > > > > > > I installed rc for 1.5 from pip to check that I'm not doing > something > > > wrong when building. And I found out that the usage of CPU is > quite > > > subpar ( https://imgur.com/fRmbQNc ) compared to a version > compiled > > > from source. The pip package is using 4-5 cores of the 32. > When I > > > compile from source I get good core utilization. ( > > > https://imgur.com/e8BB425 ). I verified this also on a > c5d.18xlarge > > > and a 32 core AMD bare metal machine. > > > > > > Seems to me also that the version from pip is using gomp > instead of > > > llvm's omp. I'm not sure why. > > > > > > pip install mxnet==1.5.0b20190627 > > > /home/piotr/py3_1.5rc/lib/python3.6/site-packages/mxnet > > > piotr@panther:0: ~/p/l/p/s/mxnet> ldd libmxnet.so | grep omp > > > libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 > > > (0x00007f99d1832000) > > > > > > I tried cifar10 on a bare metal 32 core AMD Zen machine and is > > > extremely slow, doesn't seem to make much progress, when > compared to > > a > > > c5d.18xlarge, I couldn't even do 1 epoch, tried with and > without MKL > > > without much success. Will continue digging into this when > possible. > > > > > > > > > Pedro. > > > > > > On Thu, Jun 27, 2019 at 9:41 PM Manu Seth < > [email protected]> > > wrote: > > > > > > > > Hi all, > > > > > > > > I ran the same cifar10.py script as Pedro, but for 20 epochs. > > Considering > > > > the first 10 epochs for warm-up, I averaged time per epoch > for the > > last > > > 10 > > > > epochs. > > > > > > > > With MXNet 1.4.1 average time is 164.23 s > > > > With MXNet 1.5.0 average time is 174.59 s (~6.3% regression) > > > > > > > > > > > > For a second data point, I ran Gluon speed test benchmark > script - > > > > > > > > > > https://github.com/apache/incubator-mxnet/blob/master/benchmark/python/gluon/benchmark_gluon.py > > > > using the following command: > > > > python3 benchmark_gluon.py --model 'resnet152_v2' > --batch-size 128 > > > > --num-batches 200 --type 'training' > > > > > > > > I got the following speeds: > > > > With MXNet 1.4.1, average speed is 25.677534 img/s > > > > With MXNet 1.5.0, average speed is 25.082130 img/s (~2.3% > > regression) > > > > > > > > Note: > > > > For 1.4.1 version, I used pip install mxnet-mkl==1.4.1 > > > > For 1.5.0 version, I used pip install > mxnet-mkl==1.5.0b20190619 > > which > > > > corresponds to commit# > ccbbf6b4b76ea536a6583c99497c83b65a20817b > > which is > > > > behind 1.5.x branch by 4 commits > > > > > > > > > > > > Best, > > > > Manu > > > > > > > > > > > > On 6/27/19, 3:37 PM, "sandeep krishnamurthy" < > > > [email protected]> > > > > wrote: > > > > > > > > Hello Ciyong/Pedro, > > > > > > > > Ran operator benchmarks on 1.4.1 and 1.5.0.rc2. (Not > complete, > > > doesn’t > > > > cover all MXNet operators, not presented in best > possible way, > > still > > > > WIP) > > > > > > > > > > > > > > https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50 > > > > > > > > Following operators looks slower in 1.5 compared to > 1.4.1: > > > > - BatchNorm > > > > - Pooling > > > > - FullyConnected > > > > - batch_dot > > > > - Dot > > > > - broadcast_mul > > > > - log_softmax > > > > and few other operators > > > > > > > > Also, several operators runs a lot faster on 1.5 > compared to > > 1.4.1. > > > For > > > > example - Convolution, flatten, elementwise operators > etc. So > > I see > > > that > > > > likely few operators have regressed noticeably, however, > due > > to other > > > > operator performance improvements, the end effect is not > that > > > > significant > > > > hiding a lot of regression. We need more detailed > analysis per > > > operator > > > > performance. We will not be able to do this for current > > release, we > > > > should > > > > have a more concrete way to determining such performance > > regression > > > > before > > > > next release. > > > > > > > > Setup: > > > > 1.5 => Build from source (head of 1.5.rc2 tag), built > with > > MKLDNN > > > > 1.4.1 => PyPi mxnet-mkl==1.4.1 > > > > Machine: C5.18X > > > > No explicit environment variable were set > > > > Operator benchmark code - > > > > > > > > > > https://github.com/apache/incubator-mxnet/tree/master/benchmark/opperf > > > > > > > > Best, > > > > Sandeep > > > > > > > > > > > > On Thu, Jun 27, 2019 at 10:42 AM Pedro Larroy < > > > > [email protected]> > > > > wrote: > > > > > > > > > I will try to run a few benchmarks in a bare metal > instance > > > tonight to > > > > > remove virtualization variance for the measurements and > > provide > > > some > > > > > numbers. > > > > > > > > > > Please propose a set of models / examples that would be > > desirable > > > to > > > > > run before the release and provide a link to an easy > to run > > script > > > > > with instructions so we can validate the release > better. > > > > > > > > > > Thank you. > > > > > > > > > > On Thu, Jun 27, 2019 at 10:01 AM Lai Wei < > > [email protected]> > > > wrote: > > > > > > > > > > > > Dear @dev, > > > > > > > > > > > > I m cancelling the vote for cached op fix: > > > > > > > > > > > > https://github.com/apache/incubator-mxnet/pull/15298 > > > > > > > > > > > > As for the possible cpu training regression, it > looks like > > not a > > > > blocker > > > > > > for now. > > > > > > > > > > > > I will start a new rc2 vote, please help to validate. > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > > > On Thu, Jun 27, 2019 at 10:06 PM Chen, Ciyong < > > > [email protected] > > > > > > > > > > wrote: > > > > > > > > > > > > > Hi Pedro, > > > > > > > > > > > > > > I was able to reproduced the similar result (v1.5 > is > > ~%5.6 > > > slower > > > > than > > > > > > > v1.4, I was using 18 cores for computing) with your > > script on > > > > > C5.18xlarge. > > > > > > > But need to bind the cores with below command when > > running the > > > > script, > > > > > > > (without setting the env variables, I got a close > time > > (<1%) > > > with > > > > v1.5 > > > > > and > > > > > > > v1.4) > > > > > > > export > > > > KMP_AFFINITY=granularity=fine,noduplicates,compact,1,0 > > > > > > > export OMP_NUM_THREADS=18 > > > > > > > > > > > > > > Did you set any env variables during running? > > > > > > > > > > > > > > The performance result I got as below: > > > > > > > 1) 1.4.1.rc0 > (1a7199691f5cbc6012bb53eecbf884bed5ae6590) > > > > > > > real 12m10.856s > > > > > > > user 234m49.576s > > > > > > > sys 4m38.044s > > > > > > > > > > > > > > 2) 1.5.0.rc1 > (4d9667121ae6fb643f2a02ab15e25231ed756cde) > > > > > > > real 12m52.140s > > > > > > > user 246m30.740s > > > > > > > sys 5m8.188s > > > > > > > > > > > > > > As I looked at the profiling data, most of the ops > have > > same > > > perf > > > > > between > > > > > > > v1.4 and v1.5. But some ops like " > _backward_BatchNorm" > > and > > > > "Pooling" > > > > > is > > > > > > > ~1.37x slower on v1.5 compared with v1.4. > > > > > > > Will do further analysis on these ops. > > > > > > > > > > > > > > Here's the hardware/OS info from my side: > > > > > > > ----------Python Info---------- > > > > > > > Version : 3.6.8 > > > > > > > Compiler : GCC 7.3.0 > > > > > > > Build : ('default', 'Dec 30 2018 01:22:34') > > > > > > > Arch : ('64bit', '') > > > > > > > ------------Pip Info----------- > > > > > > > Version : 19.0.3 > > > > > > > Directory : > > > > > > > > > > > > > > /home/ubuntu/anaconda3/envs/perf-mxnet/lib/python3.6/site-packages/pip > > > > > > > ----------MXNet Info----------- > > > > > > > Version : 1.5.0 > > > > > > > Directory : > > /home/ubuntu/ws/incubator-mxnet/python/mxnet > > > > > > > Hashtag not found. Not installed from pre-built > package. > > > > > > > ----------System Info---------- > > > > > > > Platform : > > > Linux-4.4.0-1085-aws-x86_64-with-debian-stretch-sid > > > > > > > system : Linux > > > > > > > node : ip-172-31-32-129 > > > > > > > release : 4.4.0-1085-aws > > > > > > > version : #96-Ubuntu SMP Tue Jun 11 09:08:32 > UTC > > 2019 > > > > > > > ----------Hardware Info---------- > > > > > > > machine : x86_64 > > > > > > > processor : x86_64 > > > > > > > Architecture: x86_64 > > > > > > > CPU op-mode(s): 32-bit, 64-bit > > > > > > > Byte Order: Little Endian > > > > > > > CPU(s): 72 > > > > > > > On-line CPU(s) list: 0-71 > > > > > > > Thread(s) per core: 2 > > > > > > > Core(s) per socket: 18 > > > > > > > Socket(s): 2 > > > > > > > NUMA node(s): 2 > > > > > > > Vendor ID: GenuineIntel > > > > > > > CPU family: 6 > > > > > > > Model: 85 > > > > > > > Model name: Intel(R) Xeon(R) Platinum > 8124M > > CPU @ > > > > 3.00GHz > > > > > > > Stepping: 3 > > > > > > > CPU MHz: 3000.000 > > > > > > > BogoMIPS: 6000.00 > > > > > > > Hypervisor vendor: KVM > > > > > > > Virtualization type: full > > > > > > > L1d cache: 32K > > > > > > > L1i cache: 32K > > > > > > > L2 cache: 1024K > > > > > > > L3 cache: 25344K > > > > > > > NUMA node0 CPU(s): 0-17,36-53 > > > > > > > NUMA node1 CPU(s): 18-35,54-71 > > > > > > > Flags: fpu vme de pse tsc msr pae > mce > > cx8 apic > > > > sep mtrr > > > > > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 > ss ht > > syscall > > > nx > > > > > pdpe1gb > > > > > > > rdtscp lm constant_tsc arch_perfmon rep_good nopl > > xtopology > > > > nonstop_tsc > > > > > > > aperfmperf tsc_known_freq pni pclmulqdq monitor > ssse3 > > fma cx16 > > > > pcid > > > > > sse4_1 > > > > > > > sse4_2 x2apic movbe popcnt tsc_deadline_timer aes > xsave > > avx > > > f16c > > > > rdrand > > > > > > > hypervisor lahf_lm abm 3dnowprefetch invpcid_single > > kaiser > > > > fsgsbase > > > > > > > tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid > rtm mpx > > avx512f > > > > rdseed > > > > > adx > > > > > > > smap clflushopt clwb avx512cd xsaveopt xsavec > xgetbv1 > > ida arat > > > pku > > > > > > > ----------Network Test---------- > > > > > > > > > > > > > > > > > > > > > -Ciyong > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Zhao, Patric [mailto:[email protected]] > > > > > > > Sent: Thursday, June 27, 2019 9:55 AM > > > > > > > To: [email protected] > > > > > > > Cc: [email protected] > > > > > > > Subject: RE: [VOTE] Release Apache MXNet > (incubating) > > version > > > > 1.5.0.rc1 > > > > > > > > > > > > > > Could we run more epochs to see the performance > > difference or > > > > profiling > > > > > > > the difference between good and bad run? > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > From: Pedro Larroy [mailto: > > [email protected]] > > > > > > > > Sent: Thursday, June 27, 2019 9:35 AM > > > > > > > > To: [email protected] > > > > > > > > Cc: [email protected] > > > > > > > > Subject: Re: [VOTE] Release Apache MXNet > (incubating) > > version > > > > > > > > 1.5.0.rc1 > > > > > > > > > > > > > > > > I run again and the gap is again bigger, I guess > we > > need to > > > > average > > > > > > > > out the times across several runs: > > > > > > > > > > > > > > > > piotr@ip-172-31-63-171 > > :0:~/deeplearning-benchmark/dawnbench > > > > > > > > (master)+$ time ~/mxnet_1.4/py3_venv/bin/python > > cifar10.py > > > > --epochs 5 > > > > > > > > && time ~/mxnet_1.5/py3_venv/bin/python > cifar10.py > > --epochs 5 > > > > > > > > [23:17:09] > ../src/io/iter_image_recordio_2.cc:172: > > > > > > > > ImageRecordIOParser2: > > > > > > > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use > > > 4 > > > > > threads > > > > > > > > for decoding.. > > > > > > > > [23:17:09] > ../src/io/iter_image_recordio_2.cc:230: > > Load mean > > > > image > > > > > > > > from > > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin > > > > > > > > [23:17:09] > ../src/io/iter_image_recordio_2.cc:248: > > Load mean > > > > image > > > > > > > > from > > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin > > > > completed > > > > > > > > [23:17:09] > ../src/io/iter_image_recordio_2.cc:172: > > > > > > > > ImageRecordIOParser2: > > > > > > > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4 > > > > threads > > > > > > > > for decoding.. > > > > > > > > [23:17:09] > ../src/io/iter_image_recordio_2.cc:230: > > Load mean > > > > image > > > > > > > > from > > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin > > > > > > > > [23:17:09] > ../src/io/iter_image_recordio_2.cc:248: > > Load mean > > > > image > > > > > > > > from > > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin > > > > completed > > > > > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, > 123: > > 0.0005, > > > > 300: > > > > > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 > > [23:17:09] > > > > > > > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: > Allocate > > > > > > > > 147456 bytes with malloc directly > > > > > > > > [23:17:09] > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: > > > Allocate > > > > > > > > 589824 bytes with malloc directly > > > > > > > > [23:17:09] > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: > > > Allocate > > > > > > > > 2359296 bytes with malloc directly > > > > > > > > [23:17:09] > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: > > > Allocate > > > > > > > > 9437184 bytes with malloc directly > > > > > > > > Epoch 0, Batch 199, Speed=384.149839 > > > > > > > > Epoch 0, Duration=140.919567 > > > > > > > > Epoch 0, Training accuracy=0.115169 > > > > > > > > Epoch 0, Validation accuracy=0.141317 > > > > > > > > Epoch 1, Batch 199, Speed=433.380512 > > > > > > > > Epoch 1, Duration=119.553233 > > > > > > > > Epoch 1, Training accuracy=0.170956 > > > > > > > > Epoch 1, Validation accuracy=0.216146 > > > > > > > > Epoch 2, Batch 199, Speed=434.864699 > > > > > > > > Epoch 2, Duration=123.278490 > > > > > > > > Epoch 2, Training accuracy=0.209455 > > > > > > > > Epoch 2, Validation accuracy=0.247296 > > > > > > > > Epoch 3, Batch 199, Speed=433.401854 > > > > > > > > Epoch 3, Duration=118.327797 > > > > > > > > Epoch 3, Training accuracy=0.248701 > > > > > > > > Epoch 3, Validation accuracy=0.302083 > > > > > > > > Epoch 4, Batch 199, Speed=419.713707 > > > > > > > > Epoch 4, Duration=126.468409 > > > > > > > > Epoch 4, Training accuracy=0.260949 > > > > > > > > Epoch 4, Validation accuracy=0.269030 > > > > > > > > > > > > > > > > real 10m55.796s > > > > > > > > user 399m33.567s > > > > > > > > sys 13m55.904s > > > > > > > > [23:28:04] > ../src/io/iter_image_recordio_2.cc:172: > > > > > > > > ImageRecordIOParser2: > > > > > > > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use > > > 4 > > > > > threads > > > > > > > > for decoding.. > > > > > > > > [23:28:04] > ../src/io/iter_image_recordio_2.cc:230: > > Load mean > > > > image > > > > > > > > from > > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin > > > > > > > > [23:28:04] > ../src/io/iter_image_recordio_2.cc:248: > > Load mean > > > > image > > > > > > > > from > > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin > > > > completed > > > > > > > > [23:28:04] > ../src/io/iter_image_recordio_2.cc:172: > > > > > > > > ImageRecordIOParser2: > > > > > > > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4 > > > > threads > > > > > > > > for decoding.. > > > > > > > > [23:28:04] > ../src/io/iter_image_recordio_2.cc:230: > > Load mean > > > > image > > > > > > > > from > > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin > > > > > > > > [23:28:04] > ../src/io/iter_image_recordio_2.cc:248: > > Load mean > > > > image > > > > > > > > from > > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin > > > > completed > > > > > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, > 123: > > 0.0005, > > > > 300: > > > > > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 > Epoch > > 0, Batch > > > > 199, > > > > > > > > Speed=419.039188 Epoch 0, Duration=143.934903 > Epoch 0, > > > Training > > > > > > > > accuracy=0.122542 Epoch 0, Validation > accuracy=0.164359 > > > Epoch 1, > > > > > Batch > > > > > > > > 199, Speed=445.257048 Epoch 1, > Duration=135.248399 > > Epoch 1, > > > > Training > > > > > > > > accuracy=0.178828 Epoch 1, Validation > accuracy=0.199419 > > > Epoch 2, > > > > > Batch > > > > > > > > 199, Speed=447.115215 Epoch 2, > Duration=132.003770 > > Epoch 2, > > > > Training > > > > > > > > accuracy=0.217808 Epoch 2, Validation > accuracy=0.233073 > > > Epoch 3, > > > > > Batch > > > > > > > > 199, Speed=441.079477 Epoch 3, > Duration=126.543316 > > Epoch 3, > > > > Training > > > > > > > > accuracy=0.248102 Epoch 3, Validation > accuracy=0.293870 > > > Epoch 4, > > > > > Batch > > > > > > > > 199, Speed=449.329787 Epoch 4, > Duration=138.398325 > > Epoch 4, > > > > Training > > > > > > > > accuracy=0.270021 Epoch 4, Validation > accuracy=0.311498 > > > > > > > > > > > > > > > > real 11m45.329s > > > > > > > > user 426m13.908s > > > > > > > > sys 16m45.093s > > > > > > > > > > > > > > > > On Wed, Jun 26, 2019 at 4:18 PM Pedro Larroy > > > > > > > > <[email protected]> wrote: > > > > > > > > > > > > > > > > > > The difference looks smaller now, more like > your > > numbers. I > > > > wonder > > > > > > > > > if something happened during the previous > benchmark > > like a > > > > system > > > > > > > > > update... > > > > > > > > > > > > > > > > > > > > > > > > > > > piotr@ip-172-31-63-171 > > > :0:~/deeplearning-benchmark/dawnbench > > > > > > > > (master)+$ > > > > > > > > > time ~/mxnet_1.4/py3_venv/bin/python cifar10.py > > --epochs 5 > > > && > > > > time > > > > > > > > > ~/mxnet_1.5/py3_venv/bin/python cifar10.py > --epochs 5 > > > > [22:49:41] > > > > > > > > > ../src/io/iter_image_recordio_2.cc:172: > > > > > > > > > ImageRecordIOParser2: > > > > > > > > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, > > > use 4 > > > > > > > > > threads for decoding.. > > > > > > > > > [22:49:41] > ../src/io/iter_image_recordio_2.cc:230: > > Load > > > mean > > > > image > > > > > > > > > from > > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin > > > > > > > > > [22:49:41] > ../src/io/iter_image_recordio_2.cc:248: > > Load > > > mean > > > > image > > > > > > > > > from > > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin > > > > > > > > completed > > > > > > > > > [22:49:41] > ../src/io/iter_image_recordio_2.cc:172: > > > > > > > > > ImageRecordIOParser2: > > > > > > > > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, > > > use 4 > > > > > > > > > threads for decoding.. > > > > > > > > > [22:49:41] > ../src/io/iter_image_recordio_2.cc:230: > > Load > > > mean > > > > image > > > > > > > > > from > > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin > > > > > > > > > [22:49:41] > ../src/io/iter_image_recordio_2.cc:248: > > Load > > > mean > > > > image > > > > > > > > > from > > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin > > > > > > > > completed > > > > > > > > > lr_schedule: {0: 0.05, 82: > 0.005000000000000001, 123: > > > 0.0005, > > > > 300: > > > > > > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 > > [22:49:42] > > > > > > > > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: > Allocate > > > > > > > > > 147456 bytes with malloc directly > > > > > > > > > [22:49:42] > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: > > > > Allocate > > > > > > > > > 589824 bytes with malloc directly > > > > > > > > > [22:49:42] > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: > > > > Allocate > > > > > > > > > 2359296 bytes with malloc directly > > > > > > > > > [22:49:42] > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: > > > > Allocate > > > > > > > > > 9437184 bytes with malloc directly > > > > > > > > > Epoch 0, Batch 199, Speed=426.182733 Epoch 0, > > > > Duration=134.868458 > > > > > > > > > Epoch 0, Training accuracy=0.127238 Epoch 0, > > Validation > > > > > > > > > accuracy=0.206388 Epoch 1, Batch 199, > > Speed=313.127156 > > > Epoch > > > > 1, > > > > > > > > > Duration=128.041775 Epoch 1, Training > > accuracy=0.182065 > > > Epoch > > > > 1, > > > > > > > > > Validation accuracy=0.202524 Epoch 2, Batch > 199, > > > > Speed=410.931187 > > > > > > > > > Epoch 2, Duration=124.920588 Epoch 2, Training > > > > accuracy=0.202584 > > > > > > > > > Epoch 2, Validation accuracy=0.245693 Epoch 3, > Batch > > 199, > > > > > > > > > Speed=419.119335 Epoch 3, Duration=120.948349 > Epoch > > 3, > > > > Training > > > > > > > > > accuracy=0.235854 Epoch 3, Validation > > accuracy=0.291066 > > > Epoch > > > > 4, > > > > > > > > > Batch 199, Speed=430.473733 Epoch 4, > > Duration=130.181724 > > > > Epoch 4, > > > > > > > > > Training accuracy=0.257773 Epoch 4, Validation > > > > accuracy=0.304988 > > > > > > > > > > > > > > > > > > real 11m7.356s > > > > > > > > > user 406m9.910s > > > > > > > > > sys 14m18.349s > > > > > > > > > [23:00:49] > ../src/io/iter_image_recordio_2.cc:172: > > > > > > > > > ImageRecordIOParser2: > > > > > > > > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, > > > use 4 > > > > > > > > > threads for decoding.. > > > > > > > > > [23:00:49] > ../src/io/iter_image_recordio_2.cc:230: > > Load > > > mean > > > > image > > > > > > > > > from > > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin > > > > > > > > > [23:00:49] > ../src/io/iter_image_recordio_2.cc:248: > > Load > > > mean > > > > image > > > > > > > > > from > > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin > > > > > > > > completed > > > > > > > > > [23:00:49] > ../src/io/iter_image_recordio_2.cc:172: > > > > > > > > > ImageRecordIOParser2: > > > > > > > > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, > > > use 4 > > > > > > > > > threads for decoding.. > > > > > > > > > [23:00:49] > ../src/io/iter_image_recordio_2.cc:230: > > Load > > > mean > > > > image > > > > > > > > > from > > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin > > > > > > > > > [23:00:49] > ../src/io/iter_image_recordio_2.cc:248: > > Load > > > mean > > > > image > > > > > > > > > from > > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin > > > > > > > > completed > > > > > > > > > lr_schedule: {0: 0.05, 82: > 0.005000000000000001, 123: > > > 0.0005, > > > > 300: > > > > > > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 > Epoch > > 0, > > > Batch > > > > 199, > > > > > > > > > Speed=348.618154 Epoch 0, Duration=146.469352 > Epoch > > 0, > > > > Training > > > > > > > > > accuracy=0.124121 Epoch 0, Validation > > accuracy=0.167227 > > > Epoch > > > > 1, > > > > > > > > > Batch 199, Speed=452.790825 Epoch 1, > > Duration=130.199421 > > > > Epoch 1, > > > > > > > > > Training > > > > > > > > > accuracy=0.183863 Epoch 1, Validation > > accuracy=0.237079 > > > Epoch > > > > 2, > > > > > > > > > Batch 199, Speed=451.406559 Epoch 2, > > Duration=126.320823 > > > > Epoch 2, > > > > > > > > > Training > > > > > > > > > accuracy=0.214844 Epoch 2, Validation > > accuracy=0.244692 > > > Epoch > > > > 3, > > > > > > > > > Batch 199, Speed=403.161873 Epoch 3, > > Duration=125.331660 > > > > Epoch 3, > > > > > > > > > Training > > > > > > > > > accuracy=0.243506 Epoch 3, Validation > > accuracy=0.301182 > > > Epoch > > > > 4, > > > > > > > > > Batch 199, Speed=450.826598 Epoch 4, > > Duration=126.426253 > > > > Epoch 4, > > > > > > > > > Training > > > > > > > > > accuracy=0.266424 Epoch 4, Validation > > accuracy=0.311899 > > > > > > > > > > > > > > > > > > real 11m21.930s > > > > > > > > > user 415m3.855s > > > > > > > > > sys 13m53.975s > > > > > > > > > > > > > > > > > > On Wed, Jun 26, 2019 at 3:50 PM Pedro Larroy > > > > > > > > > <[email protected]> wrote: > > > > > > > > > > > > > > > > > > > > Hi Ciyong, thanks for trying to reproduce: > > > > > > > > > > > > > > > > > > > > I used this one: > > > > > > > > > > https://github.com/awslabs/deeplearning- > > > > > > > > benchmark/blob/master/dawnbe > > > > > > > > > > nch/cifar10.py > > > > > > > > > > > > > > > > > > > > Could you provide hardware and OS details? > > > > > > > > > > > > > > > > > > > > I will rerun and repost numbers in a few > minutes. > > > > > > > > > > > > > > > > > > > > Pedro. > > > > > > > > > > > > > > > > > > > > On Wed, Jun 26, 2019 at 4:18 AM Chen, Ciyong > > > > > > > > > > <[email protected]> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > Hi Pedro, > > > > > > > > > > > > > > > > > > > > > > I'm looking at this case, and using the > script of > > > > > > > > > > > > > > > > "incubator-mxnet/example/image-classification/train_cifar10.py" > > > > > > > > > > > to get > > > > > > > > the timing data, but seems there's not much > difference > > > between > > > > mxnet > > > > > > > > 1.4.1.rc0 and 1.5.0.rc1 on C5.18xlarge. > > > > > > > > > > > > > > > > > > > > > > Not sure if there's any difference in the > python > > > script, > > > > can > > > > > you > > > > > > > > > > > point me > > > > > > > > the link to get your script (cifar10.py)? > > > > > > > > > > > Or you can also have a try with MXNet's > script > > > > > > > > > > > (train_cifar10.py) and see > > > > > > > > the performance. > > > > > > > > > > > > > > > > > > > > > > Here's the command I used to collect the > time: > > > > > > > > > > > python train_cifar10.py > --num-epoch=5 > > > > > > > > > > > > > > > > > > > > > > 1) 1.5.0.rc1 > > (4d9667121ae6fb643f2a02ab15e25231ed756cde) > > > > > > > > > > > real 9m4.880s > > > > > > > > > > > user 333m13.340s > > > > > > > > > > > sys 14m36.100s > > > > > > > > > > > > > > > > > > > > > > 2) 1.4.1.rc0 > > (1a7199691f5cbc6012bb53eecbf884bed5ae6590) > > > > > > > > > > > real 9m2.155s > > > > > > > > > > > user 329m37.092s > > > > > > > > > > > sys 16m8.668s > > > > > > > > > > > > > > > > > > > > > > -Ciyong > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > > > From: Pedro Larroy [mailto: > > > [email protected]] > > > > > > > > > > > Sent: Wednesday, June 26, 2019 6:28 AM > > > > > > > > > > > To: [email protected] > > > > > > > > > > > Cc: [email protected] > > > > > > > > > > > Subject: Re: [VOTE] Release Apache MXNet > > (incubating) > > > > version > > > > > > > > > > > 1.5.0.rc1 > > > > > > > > > > > > > > > > > > > > > > Hi these were my build flags and system > info: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --- # CMake configuration > > > > > > > > > > > USE_CUDA: "OFF" # Build with CUDA support > > > > > > > > > > > USE_OLDCMAKECUDA: "OFF" # Build with old > cmake > > cuda > > > > > > > > > > > USE_NCCL: "OFF" # Use NVidia NCCL with CUDA > > > > > > > > > > > USE_OPENCV: "ON" # Build with OpenCV > support > > > > > > > > > > > USE_OPENMP: "ON" # Build with Openmp > support > > > > > > > > > > > USE_CUDNN: "ON" # Build with cudnn > support) # > > one could > > > > set > > > > > > > > > > > CUDNN_ROOT for search path > > > > > > > > > > > USE_SSE: "ON" # Build with x86 SSE > instruction > > support > > > IF > > > > NOT > > > > > > > > > > > ARM > > > > > > > > > > > USE_F16C: "ON" # Build with x86 F16C > instruction > > > support) > > > > # > > > > > > > > autodetects support if "ON" > > > > > > > > > > > USE_LAPACK: "ON" # Build with lapack > support > > > > > > > > > > > USE_MKL_IF_AVAILABLE: "ON" # Use MKL if > found > > > > > > > > > > > USE_MKLML_MKL: "ON" # Use MKLDNN variant > of MKL > > (if MKL > > > > found) > > > > > > > > > > > IF USE_MKL_IF_AVAILABLE AND (NOT APPLE) > > > > > > > > > > > USE_MKLDNN: "ON" # Use MKLDNN variant of > MKL (if > > MKL > > > > found) IF > > > > > > > > > > > USE_MKL_IF_AVAILABLE AND (NOT APPLE) > > > > > > > > > > > USE_OPERATOR_TUNING: "ON" # Enable > auto-tuning of > > > > operators IF > > > > > > > > NOT > > > > > > > > > > > MSVC > > > > > > > > > > > USE_GPERFTOOLS: "ON" # Build with > GPerfTools > > support > > > (if > > > > found) > > > > > > > > > > > USE_JEMALLOC: "ON" # Build with Jemalloc > support > > > > > > > > > > > USE_PROFILER: "ON" # Build with Profiler > support > > > > > > > > > > > USE_DIST_KVSTORE: "OFF" # Build with > DIST_KVSTORE > > > support > > > > > > > > > > > USE_PLUGINS_WARPCTC: "OFF" # Use WARPCTC > Plugins > > > > > > > > > > > USE_PLUGIN_CAFFE: "OFF" # Use Caffe Plugin > > > > > > > > > > > USE_CPP_PACKAGE: "OFF" # Build C++ Package > > > > > > > > > > > USE_MXNET_LIB_NAMING: "ON" # Use MXNet > library > > naming > > > > > > > > conventions. > > > > > > > > > > > USE_GPROF: "OFF" # Compile with gprof > > (profiling) flag > > > > > > > > > > > USE_CXX14_IF_AVAILABLE: "OFF" # Build with > C++14 > > if the > > > > > compiler > > > > > > > > > > > supports it > > > > > > > > > > > USE_VTUNE: "OFF" # Enable use of Intel > Amplifier > > XE > > > > (VTune)) # > > > > > > > > > > > one could set VTUNE_ROOT for search path > > > > > > > > > > > ENABLE_CUDA_RTC: "ON" # Build with CUDA > runtime > > > > compilation > > > > > > > > > > > support > > > > > > > > > > > BUILD_CPP_EXAMPLES: "ON" # Build cpp > examples > > > > > > > > > > > INSTALL_EXAMPLES: "OFF" # Install the > example > > source > > > > files. > > > > > > > > > > > USE_SIGNAL_HANDLER: "ON" # Print stack > traces on > > > > segfaults. > > > > > > > > > > > USE_TENSORRT: "OFF" # Enable infeference > > optimization > > > with > > > > > > > TensorRT. > > > > > > > > > > > USE_ASAN: "OFF" # Enable Clang/GCC ASAN > > sanitizers. > > > > > > > > > > > ENABLE_TESTCOVERAGE: "OFF" # Enable > compilation > > with > > > test > > > > > > > > > > > coverage metric output > > > > > > > > > > > CMAKE_BUILD_TYPE: "Release" > > > > > > > > > > > CMAKE_CUDA_COMPILER_LAUNCHER: "ccache" > > > > > > > > > > > CMAKE_C_COMPILER_LAUNCHER: "ccache" > > > > > > > > > > > CMAKE_CXX_COMPILER_LAUNCHER: "ccache" > > > > > > > > > > > > > > > > > > > > > > commit > 4d9667121ae6fb643f2a02ab15e25231ed756cde > > (HEAD, > > > > tag: > > > > > > > > > > > 1.5.0.rc1, > > > > > > > > > > > upstream/v1.5.x) > > > > > > > > > > > commit > 1a7199691f5cbc6012bb53eecbf884bed5ae6590 > > (HEAD, > > > > tag: > > > > > > > > > > > 1.4.1.rc0, > > > > > > > > > > > upstream/v1.4.x) > > > > > > > > > > > > > > > > > > > > > > curl > > > http://169.254.169.254/latest/meta-data/instance-type > > > > > > > > > > > c5d.18xlarge > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Version : 3.6.7 > > > > > > > > > > > Compiler : GCC 8.2.0 > > > > > > > > > > > Build : ('default', 'Oct 22 2018 > > 11:32:17') > > > > > > > > > > > Arch : ('64bit', 'ELF') > > > > > > > > > > > ------------Pip Info----------- > > > > > > > > > > > Version : 19.1.1 > > > > > > > > > > > Directory : > > > > > /home/piotr/mxnet_1.5/py3_venv/lib/python3.6/site- > > > > > > > > packages/pip > > > > > > > > > > > ----------MXNet Info----------- > > > > > > > > > > > Version : 1.5.0 > > > > > > > > > > > Directory : > /home/piotr/mxnet_1.5/python/mxnet > > > > > > > > > > > Hashtag not found. Not installed from > pre-built > > > package. > > > > > > > > > > > ----------System Info---------- > > > > > > > > > > > Platform : > > > > > > > > Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic > > > > > > > > > > > system : Linux > > > > > > > > > > > node : ip-172-31-63-171 > > > > > > > > > > > release : 4.15.0-1035-aws > > > > > > > > > > > version : #37-Ubuntu SMP Mon Mar 18 > > 16:15:14 UTC > > > 2019 > > > > > > > > > > > ----------Hardware Info---------- > > > > > > > > > > > machine : x86_64 > > > > > > > > > > > processor : x86_64 > > > > > > > > > > > Architecture: x86_64 > > > > > > > > > > > CPU op-mode(s): 32-bit, 64-bit > > > > > > > > > > > Byte Order: Little Endian > > > > > > > > > > > CPU(s): 72 > > > > > > > > > > > On-line CPU(s) list: 0-71 > > > > > > > > > > > Thread(s) per core: 2 > > > > > > > > > > > Core(s) per socket: 18 > > > > > > > > > > > Socket(s): 2 > > > > > > > > > > > NUMA node(s): 2 > > > > > > > > > > > Vendor ID: GenuineIntel > > > > > > > > > > > CPU family: 6 > > > > > > > > > > > Model: 85 > > > > > > > > > > > Model name: Intel(R) Xeon(R) > Platinum > > 8124M > > > CPU @ > > > > > 3.00GHz > > > > > > > > > > > Stepping: 4 > > > > > > > > > > > CPU MHz: 1326.446 > > > > > > > > > > > BogoMIPS: 6000.00 > > > > > > > > > > > Hypervisor vendor: KVM > > > > > > > > > > > Virtualization type: full > > > > > > > > > > > L1d cache: 32K > > > > > > > > > > > L1i cache: 32K > > > > > > > > > > > L2 cache: 1024K > > > > > > > > > > > L3 cache: 25344K > > > > > > > > > > > NUMA node0 CPU(s): 0-17,36-53 > > > > > > > > > > > NUMA node1 CPU(s): 18-35,54-71 > > > > > > > > > > > Flags: fpu vme de pse tsc > msr pae > > mce cx8 > > > > apic > > > > > sep > > > > > > > mtrr > > > > > > > > > > > pge mca cmov pat pse36 clflush mmx fxsr > sse sse2 > > ss ht > > > > syscall > > > > > > > > > > > nx pdpe1gb rdtscp lm constant_tsc > arch_perfmon > > rep_good > > > > nopl > > > > > > > > > > > xtopology nonstop_tsc cpuid aperfmperf pni > > pclmulqdq > > > > monitor > > > > > > > > > > > ssse3 fma cx16 pcid > > > > > > > > > > > sse4_1 sse4_2 x2apic movbe popcnt > > tsc_deadline_timer > > > aes > > > > xsave > > > > > > > > > > > avx f16c rdrand hypervisor lahf_lm abm > > 3dnowprefetch > > > > > > > > > > > invpcid_single pti fsgsbase tsc_adjust > bmi1 hle > > avx2 > > > smep > > > > bmi2 > > > > > > > > > > > erms invpcid rtm mpx avx512f avx512dq > rdseed adx > > smap > > > > > clflushopt > > > > > > > > > > > clwb avx512cd avx512bw avx512vl xsaveopt > xsavec > > xgetbv1 > > > > xsaves > > > > > > > > > > > ida arat pku ospke ----------Network > > Test---------- > > > > > > > > > > > > > > > > > > > > > > ----------Python Info---------- > > > > > > > > > > > Version : 3.6.7 > > > > > > > > > > > Compiler : GCC 8.2.0 > > > > > > > > > > > Build : ('default', 'Oct 22 2018 > > 11:32:17') > > > > > > > > > > > Arch : ('64bit', 'ELF') > > > > > > > > > > > ------------Pip Info----------- > > > > > > > > > > > Version : 19.1.1 > > > > > > > > > > > Directory : > > > > > /home/piotr/mxnet_1.4/py3_venv/lib/python3.6/site- > > > > > > > > packages/pip > > > > > > > > > > > ----------MXNet Info----------- > > > > > > > > > > > Version : 1.4.1 > > > > > > > > > > > Directory : > /home/piotr/mxnet_1.4/python/mxnet > > > > > > > > > > > Hashtag not found. Not installed from > pre-built > > > package. > > > > > > > > > > > ----------System Info---------- > > > > > > > > > > > Platform : > > > > > > > > Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic > > > > > > > > > > > system : Linux > > > > > > > > > > > node : ip-172-31-63-171 > > > > > > > > > > > release : 4.15.0-1035-aws > > > > > > > > > > > version : #37-Ubuntu SMP Mon Mar 18 > > 16:15:14 UTC > > > 2019 > > > > > > > > > > > ----------Hardware Info---------- > > > > > > > > > > > machine : x86_64 > > > > > > > > > > > processor : x86_64 > > > > > > > > > > > Architecture: x86_64 > > > > > > > > > > > CPU op-mode(s): 32-bit, 64-bit > > > > > > > > > > > Byte Order: Little Endian > > > > > > > > > > > CPU(s): 72 > > > > > > > > > > > On-line CPU(s) list: 0-71 > > > > > > > > > > > Thread(s) per core: 2 > > > > > > > > > > > Core(s) per socket: 18 > > > > > > > > > > > Socket(s): 2 > > > > > > > > > > > NUMA node(s): 2 > > > > > > > > > > > Vendor ID: GenuineIntel > > > > > > > > > > > CPU family: 6 > > > > > > > > > > > Model: 85 > > > > > > > > > > > Model name: Intel(R) Xeon(R) > Platinum > > 8124M > > > CPU @ > > > > > 3.00GHz > > > > > > > > > > > Stepping: 4 > > > > > > > > > > > CPU MHz: 1223.344 > > > > > > > > > > > BogoMIPS: 6000.00 > > > > > > > > > > > Hypervisor vendor: KVM > > > > > > > > > > > Virtualization type: full > > > > > > > > > > > L1d cache: 32K > > > > > > > > > > > L1i cache: 32K > > > > > > > > > > > L2 cache: 1024K > > > > > > > > > > > L3 cache: 25344K > > > > > > > > > > > NUMA node0 CPU(s): 0-17,36-53 > > > > > > > > > > > NUMA node1 CPU(s): 18-35,54-71 > > > > > > > > > > > Flags: fpu vme de pse tsc > msr pae > > mce cx8 > > > > apic > > > > > sep > > > > > > > mtrr > > > > > > > > > > > pge mca cmov pat pse36 clflush mmx fxsr > sse sse2 > > ss ht > > > > syscall > > > > > > > > > > > nx pdpe1gb rdtscp lm constant_tsc > arch_perfmon > > rep_good > > > > nopl > > > > > > > > > > > xtopology nonstop_tsc cpuid aperfmperf pni > > pclmulqdq > > > > monitor > > > > > > > > > > > ssse3 fma cx16 pcid > > > > > > > > > > > sse4_1 sse4_2 x2apic movbe popcnt > > tsc_deadline_timer > > > aes > > > > xsave > > > > > > > > > > > avx f16c rdrand hypervisor lahf_lm abm > > 3dnowprefetch > > > > > > > > > > > invpcid_single pti fsgsbase tsc_adjust > bmi1 hle > > avx2 > > > smep > > > > bmi2 > > > > > > > > > > > erms invpcid rtm mpx avx512f avx512dq > rdseed adx > > smap > > > > > clflushopt > > > > > > > > > > > clwb avx512cd avx512bw avx512vl xsaveopt > xsavec > > xgetbv1 > > > > xsaves > > > > > > > > > > > ida arat pku ospke ----------Network > > Test---------- > > > > > > > > > > > > > > > > > > > > > > On Tue, Jun 25, 2019 at 2:35 PM Pedro > Larroy > > > > > > > > <[email protected]> wrote: > > > > > > > > > > > > > > > > > > > > > > > > I did a training of cifar10 in CPU and > seems > > there's > > > > some > > > > > > > > > > > > regressions in the range of 7% increase > of > > training > > > time > > > > > against > > > > > > > 1.4.1: > > > > > > > > > > > > > > > > > > > > > > > > (py3_venv) > > > > > > > > > > > > piotr@ip-172-31-63-171 > > > > :0:~/deeplearning-benchmark/dawnbench > > > > > > > > > > > > (master)+$ time python cifar10.py > --epochs 5 > > > > > > > > > > > > real 11m30.388s > > > > > > > > > > > > user 417m7.766s > > > > > > > > > > > > sys 16m57.315s > > > > > > > > > > > > > > > > > > > > > > > > VS 1.4.1: > > > > > > > > > > > > real 10m41.994s > > > > > > > > > > > > user 392m40.646s > > > > > > > > > > > > sys 12m30.601s > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Jun 20, 2019 at 10:15 PM Lai Wei > < > > > > > [email protected]> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Anirudh, > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for jumping into this quickly, I > > followed up > > > > on the > > > > > > > issue. > > > > > > > > > > > > > > > > > > > > > > > > > > I was meant for sockeye > > developer/maintainers to > > > help > > > > setup > > > > > > > > > > > > > nightly tests and raise issues early. > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jun 21, 2019 at 10:10 AM > Haibin Lin > > > > > > > > > > > > > <[email protected]> > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > In GluonNLP we are testing with MXNET > > nightly > > > build > > > > for > > > > > > > > > > > > > > each PR, and we did find some MXNet > > related issue > > > > caught > > > > > by > > > > > > > the CI. > > > > > > > > > > > > > > I recommend other toolkits also add > > integration > > > > tests > > > > > with > > > > > > > > > > > > > > MXNet > > > > > > > > nightly. > > > > > > > > > > > > > > It helps identify issues early. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > > Haibin > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Jun 20, 2019 at 18:52 Zhao, > Patric > > > > > > > > > > > > > > <[email protected]> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks to raise the issue and we > will > > take a > > > look > > > > ASAP. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The downstream cases is not in the > MXNet > > CI so > > > > it's > > > > > hard > > > > > > > > > > > > > > > to catch the potential bugs or > > performance > > > > degradation > > > > > > > > > > > > > > > for > > > > > > > > MXNet developers. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > In the future, I suggest adding > the major > > > > downstream > > > > > > > > > > > > > > > test cases, like > > > > > > > > > > > > > > from > > > > > > > > > > > > > > > sockeye, GluonNLP, GLuonCV, DGL, > > Gluon-TS, into > > > > the > > > > > > > > > > > > > > > nightly > > > > > > > > test. > > > > > > > > > > > > > > > If it's still too heavy, maybe > testing > > it > > > weekly > > > > or > > > > > > > > > > > > > > > monthly :) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --Patric > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > > > > > > > > From: Anirudh Subramanian > > > > > > > > > > > > > > > > [mailto:[email protected]] > > > > > > > > > > > > > > > > Sent: Friday, June 21, 2019 9:31 > AM > > > > > > > > > > > > > > > > To: > [email protected] > > > > > > > > > > > > > > > > Cc: [email protected] > > > > > > > > > > > > > > > > Subject: Re: [VOTE] Release > Apache > > MXNet > > > > (incubating) > > > > > > > > > > > > > > > > version > > > > > > > > > > > > > > > > 1.5.0.rc1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Lai, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I have opened an issue: > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/incubator-mxnet/issues/15297 > > > > > > > > > > > > > > > > I came to know about this issue > only > > today > > > and > > > > I have > > > > > > > > > > > > > > > > not been > > > > > > > > > > > > > > monitoring > > > > > > > > > > > > > > > > sockeye. > > > > > > > > > > > > > > > > I jumped onto this issue to make > sure > > it > > > wasn't > > > > > caused > > > > > > > > > > > > > > > > by the dlpack > > > > > > > > > > > > > > > changes. > > > > > > > > > > > > > > > > Also, I don't think sockeye CI > checks > > > against > > > > > master, > > > > > > > > > > > > > > > > it is using > > > > > > > > > > > > > > 1.4.1. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Anirudh > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Jun 20, 2019 at 6:17 PM > Lai Wei > > > > > > > > > > > > > > > > <[email protected]> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Could you share which test > failed and > > > what’s > > > > the > > > > > > > > > > > > > > > > > crash? How to reproduce it? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I was able to install sockeye > and > > run all > > > > tests > > > > > passed. > > > > > > > > > > > > > > > > > Using python setup.py test > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I have tested both nightly pip > > package and > > > > > 1.5.0.rc1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > It would be great to create an > issue > > with > > > > > > > > > > > > > > > > > reproducible steps and move the > > discussion > > > > there. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Also I see sockeye nightly > build[1] > > has > > > been > > > > > failing > > > > > > > > > > > > > > > > > for some time, > > > > > > > > > > > > > > if > > > > > > > > > > > > > > > > > it’s due to MXNet change, > please > > raise this > > > > early > > > > > so > > > > > > > > > > > > > > > > > we can track and solve it in > time > > rather > > > than > > > > block > > > > > > > > > > > > > > > > > the release > > > > > > > > during vote time. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1] > > https://travis-ci.org/awslabs/sockeye > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jun 21, 2019 at 7:01 AM > > Anirudh > > > > Subramanian > > > > > > > > > > > > > > > > > > > > -- Sandeep Krishnamurthy
