"if you’re linking in two then you’re doing something wrong." Correct, that's one thing I believe we've got consensus on. So let's call that out as a bug to be fixed.
Let's move forward with some reproducible numbers and then discuss the pros / cons of which particular OMP implementation we should use. On Wed, Jun 19, 2019 at 3:06 PM Pedro Larroy <pedro.larroy.li...@gmail.com> wrote: > Hi Chris > > I would ask you to have a bit of patience and help us with your > experience in this matter. Nobody is ignoring anything, I think we are > individually gathering feedbacks and trying to understand the multiple > contributions done to this topic including yours, then go step by > step, understand what is going on and run experiments and report back > to the list or the corresponding github item. It was suggested by > Kellen to prepare some containers, this takes effort. > > Regarding your final comment, most of us also have many other things > to do and responsibilities even if our daytime jobs might involve > MXNet in some form or another. I think that's part of the privilege > and responsibility of working close with an open source project and > the magic of collaboration across organizations. Let's all be patient > and take some time to understand and reason about this topic which is > not simple. Since we decided to step back and gather more data let's > take time and do it properly. > > Personally I hope to find time to look again into this issue before > the end of the week. > > Thanks. > > Pedro. > > On Wed, Jun 19, 2019 at 2:43 PM Chris Olivier <cjolivie...@apache.org> > wrote: > > > > if you’re linking in two then you’re doing something wrong. You can see > by > > my email yesterday that only one is linked in. This is also the case with > > the mkl version built by the Makefile — only the Intel OMP library is > used > > (no libgomp). > > > > That being said, Do you have clear evidence that using Intel OMP is both > > problematic and the situation isn’t fixable? The burden of proof is on > the > > ones requesting the change — it is not my responsibility to justify the > > current state. There must be something “terrible” and unfixable to > justify > > a change. I have seen no proof of this in all this time. > > > > On a side note, I mentioned a couple of things in my email yesterday that > > still are not being responded to (they were also ignored in the last > > incarnation of this “discussion” — I have much experience in this matter > to > > assume “discussion” is a waste of my time, seeing and I am not paid to > > “work on” mxnet like y’all are). > > > > -C > > > > > > > > > > > > > > On Wed, Jun 19, 2019 at 10:28 AM kellen sunderland < > > kellen.sunderl...@gmail.com> wrote: > > > > > I've also quite often seen two versions of OpenMP linked. I think we > can > > > all agree we probably want to avoid linking in two libraries that do > > > effectively the same thing. > > > > > > The performance questions should be fairly straight forward to > demonstrate > > > right? Could we just collaborate on a few minimal Dockerfiles that > show > > > (or don't show) Intel OpenMP performance speedups with the workloads > Chris > > > is referencing? > > > > > > On Wed, Jun 19, 2019 at 4:44 AM Tsukrov, Stanislav < > > > stanislav.tsuk...@gmail.com> wrote: > > > > > > > Hi, Chris! > > > > > > > > Stas here - I've gathered that performance data. > > > > Sure thing, I can be wrong, but please elaborate a bit on what we are > > > > missing. > > > > Be assured, intentional misdirection was never a case. > > > > > > > > Thanks a lot for being constructive. > > > > > > > > > Turning Intel OMP on and off (and MKL as well, since it tends to > pull > > > in > > > > omp, depending which one is linked in). > > > > > > > > We never ever considered turning MKL off. We are on the same page > here - > > > > MKL is crucial for the performance. > > > > Why should we? There's a GOMP-linked version of MKL, that we can use. > > > > > > > > What we did - we measured, if using compilers default OpenMP > > > > implementation instead of referenced source code distribution of > OpenMP > > > > makes anything slower. > > > > We have found the impact to be hardly measurable. > > > > The difference between GOMP and iOMP is <5% on our benchmarks, most > of > > > the > > > > time less than that. > > > > > > > > We just suggest to simplify the build of mxnet, by removing the > > > > unnecessary dependency. > > > > > > > > During that we discovered for example the following amazing issue: > > > > https://github.com/apache/incubator-mxnet/issues/14087 > > > > > > > > Best Regards > > > > > > > > Stas > > > > > > > > On 18.06.19, 18:24, "Chris Olivier" <cjolivie...@gmail.com> wrote: > > > > > > > > I am very reluctant to feed the trolls again, and this will be > teh > > > last > > > > time I address Pedro or Anton on the subject, but since I think > the > > > > numbers > > > > being presented are incorrect (either by te builders not really > > > > understanding what they are building, or possibly intentional > > > > misdirection): > > > > > > > > Turning Intel OMP on and off (and MKL as well, since it tends to > pull > > > > in > > > > omp, depending which one is linked in). > > > > There is a HUGE difference. This is consistent with my > experience > > > > before > > > > when it was added. > > > > > > > > > > > > default mnist: > > > > > > > > python ../example/image-classification/train_mnist.py > > > > INFO:root:start with arguments Namespace(add_stn=False, > > > batch_size=64, > > > > disp_batches=100, dtype='float32', gc_threshold=0.5, > gc_type='none', > > > > gpus=None, image_shape='1, 28, 28', initializer='default', > > > > kv_store='device', load_epoch=None, loss='', lr=0.05, > lr_factor=0.1, > > > > lr_step_epochs='10', macrobatch_size=0, model_prefix=None, > mom=0.9, > > > > monitor=0, network='mlp', num_classes=10, num_epochs=20, > > > > num_examples=60000, num_layers=None, optimizer='sgd', > > > > profile_server_suffix='', profile_worker_suffix='', > save_period=1, > > > > test_io=0, top_k=0, warmup_epochs=5, warmup_strategy='linear', > > > > wd=0.0001) > > > > > > > > INTEL OMP: > > > > > > > > ldd libmxnet.so | grep omp > > > > libomp.so => > > > > > /home/chris/src/mxnet/cmake_omp/3rdparty/openmp/runtime/src/libomp.so > > > > (0x00007f978fde7000) > > > > > > > > :root:Epoch[0] Batch [0-100] Speed: 31548.09 samples/sec > > > > accuracy=0.780012 > > > > INFO:root:Epoch[0] Batch [100-200] Speed: 16073.21 > samples/sec > > > > accuracy=0.920469 > > > > INFO:root:Epoch[0] Batch [200-300] Speed: 19075.91 > samples/sec > > > > accuracy=0.928281 > > > > INFO:root:Epoch[0] Batch [300-400] Speed: 23211.36 > samples/sec > > > > accuracy=0.942813 > > > > INFO:root:Epoch[0] Batch [400-500] Speed: 22139.79 > samples/sec > > > > accuracy=0.938750 > > > > INFO:root:Epoch[0] Batch [500-600] Speed: 23225.52 > samples/sec > > > > accuracy=0.946562 > > > > INFO:root:Epoch[0] Batch [600-700] Speed: 19547.41 > samples/sec > > > > accuracy=0.953281 > > > > INFO:root:Epoch[0] Batch [700-800] Speed: 24111.73 > samples/sec > > > > accuracy=0.951562 > > > > INFO:root:Epoch[0] Batch [800-900] Speed: 13959.88 > samples/sec > > > > accuracy=0.957500 > > > > INFO:root:Epoch[0] Train-accuracy=0.925423 > > > > INFO:root:Epoch[0] Time cost=3.806 > > > > INFO:root:Epoch[0] Validation-accuracy=0.962580 > > > > INFO:root:Epoch[1] Batch [0-100] Speed: 24560.21 > samples/sec > > > > accuracy=0.968131 > > > > INFO:root:Epoch[1] Batch [100-200] Speed: 23457.03 > samples/sec > > > > accuracy=0.966250 > > > > > > > > > > > > LIBGOMP: > > > > > > > > ldd libmxnet.so | grep omp > > > > libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 > > > > (0x00007f25c25dd000) > > > > > > > > INFO:root:Epoch[0] Batch [0-100] Speed: 1731.01 > samples/sec > > > > accuracy=0.782488 > > > > INFO:root:Epoch[0] Batch [100-200] Speed: 3551.32 > samples/sec > > > > accuracy=0.907813 > > > > INFO:root:Epoch[0] Batch [200-300] Speed: 1991.00 > samples/sec > > > > accuracy=0.927188 > > > > INFO:root:Epoch[0] Batch [300-400] Speed: 2175.45 > samples/sec > > > > accuracy=0.937969 > > > > INFO:root:Epoch[0] Batch [400-500] Speed: 1644.95 > samples/sec > > > > accuracy=0.942187 > > > > INFO:root:Epoch[0] Batch [500-600] Speed: 6444.58 > samples/sec > > > > accuracy=0.950156 > > > > INFO:root:Epoch[0] Batch [600-700] Speed: 7842.16 > samples/sec > > > > accuracy=0.947969 > > > > INFO:root:Epoch[0] Batch [700-800] Speed: 9412.07 > samples/sec > > > > accuracy=0.953750 > > > > INFO:root:Epoch[0] Batch [800-900] Speed: 12707.58 > samples/sec > > > > accuracy=0.953125 > > > > > > > > That being said, there's other issued beyond speed. The DEFAULT > > > build > > > > from > > > > makefile (not CMake) uses Intel OMP mkl (I showed before) and > > > > mysteriously > > > > it has no issues? This seems highly suspicious. All I see is a > lot > > > of > > > > hand-waving and conjecture and pointing to StackOverflow posts > made > > > by > > > > people who may be of questionable pedigree to begin with. This > > > smells > > > > of a > > > > Pedro-ego-fight rather than one of purely technical merit. > Also, if > > > > one > > > > knows how OMP works, they would be very suspicious of the > > > > "intermittent > > > > hangs" claim -- that's probably just broken race conditions > elsewhere > > > > until > > > > proven differently. It'd tend freeze on the first use if > something > > > is > > > > wrong (try using libgomp after a fork and see), since worker > threads" > > > > wouldn't be assigned/joined properly. IntelOMP is faster, but > also > > > has > > > > other advantages, such as allowing OMP after a fork. > > > > > > > > I actually addressed a lot of issues and ask for clarification > in the > > > > original PR's way back when, but they're all just ignored. > > > > > > > > -Chris > > > > > > > > > > > > > > > > > > > >