Hi Chris

I would ask you to have a bit of patience and help us with your
experience in this matter. Nobody is ignoring anything, I think we are
individually gathering feedbacks and trying to understand the multiple
contributions done to this topic including yours, then go step by
step, understand what is going on and run experiments and report back
to the list or the corresponding github item. It was suggested by
Kellen to prepare some containers, this takes effort.

Regarding your final comment, most of us also have many other things
to do and responsibilities even if our daytime jobs might involve
MXNet in some form or another. I think that's part of the privilege
and responsibility of working close with an open source project and
the magic of collaboration across organizations. Let's all be patient
and take some time to understand and reason about this topic which is
not simple. Since we decided to step back and gather more data let's
take time and do it properly.

Personally I hope to find time to look again into this issue before
the end of the week.

Thanks.

Pedro.

On Wed, Jun 19, 2019 at 2:43 PM Chris Olivier <cjolivie...@apache.org> wrote:
>
> if you’re linking in two then you’re doing something wrong. You can see by
> my email yesterday that only one is linked in. This is also the case with
> the mkl version built by the Makefile — only the Intel OMP library is used
> (no libgomp).
>
> That being said, Do you have clear evidence that using Intel OMP is both
> problematic and the situation isn’t fixable?  The burden of proof is on the
> ones requesting the change — it is not my responsibility to justify the
> current state.  There must be something “terrible” and unfixable to justify
> a change.  I have seen no proof of this in all this time.
>
> On a side note, I mentioned a couple of things in my email yesterday that
> still are not being responded to (they were also ignored in the last
> incarnation of this “discussion” — I have much experience in this matter to
> assume “discussion” is a waste of my time, seeing and I am not paid to
> “work on” mxnet like y’all are).
>
> -C
>
>
>
>
>
>
> On Wed, Jun 19, 2019 at 10:28 AM kellen sunderland <
> kellen.sunderl...@gmail.com> wrote:
>
> > I've also quite often seen two versions of OpenMP linked.  I think we can
> > all agree we probably want to avoid linking in two libraries that do
> > effectively the same thing.
> >
> > The performance questions should be fairly straight forward to demonstrate
> > right?  Could we just collaborate on a few minimal Dockerfiles that show
> > (or don't show) Intel OpenMP performance speedups with the workloads Chris
> > is referencing?
> >
> > On Wed, Jun 19, 2019 at 4:44 AM Tsukrov, Stanislav <
> > stanislav.tsuk...@gmail.com> wrote:
> >
> > > Hi, Chris!
> > >
> > > Stas here - I've gathered that performance data.
> > > Sure thing, I can be wrong, but please elaborate a bit on what we are
> > > missing.
> > > Be assured, intentional misdirection was never a case.
> > >
> > > Thanks a lot for being constructive.
> > >
> > > > Turning Intel OMP on and off (and MKL as well, since it tends to pull
> > in
> > > omp, depending which one is linked in).
> > >
> > > We never ever considered turning MKL off. We are on the same page here -
> > > MKL is crucial for the performance.
> > > Why should we? There's a GOMP-linked version of MKL, that we can use.
> > >
> > > What we did - we measured, if using compilers default OpenMP
> > > implementation instead of referenced source code distribution of OpenMP
> > > makes anything slower.
> > > We have found the impact to be hardly measurable.
> > > The difference between GOMP and iOMP is <5% on our benchmarks, most of
> > the
> > > time less than that.
> > >
> > > We just suggest to simplify the build of mxnet, by removing the
> > > unnecessary dependency.
> > >
> > > During that we discovered for example the following amazing issue:
> > > https://github.com/apache/incubator-mxnet/issues/14087
> > >
> > > Best Regards
> > >
> > > Stas
> > >
> > > On 18.06.19, 18:24, "Chris Olivier" <cjolivie...@gmail.com> wrote:
> > >
> > >     I am very reluctant to feed the trolls again, and this will be teh
> > last
> > >     time I address Pedro or Anton on the subject, but since I think the
> > > numbers
> > >     being presented are incorrect (either by te builders not really
> > >     understanding what they are building, or possibly intentional
> > > misdirection):
> > >
> > >     Turning Intel OMP on and off (and MKL as well, since it tends to pull
> > > in
> > >     omp, depending which one is linked in).
> > >     There is a HUGE difference.  This is consistent with my experience
> > > before
> > >     when it was added.
> > >
> > >
> > >     default mnist:
> > >
> > >     python ../example/image-classification/train_mnist.py
> > >     INFO:root:start with arguments Namespace(add_stn=False,
> > batch_size=64,
> > >     disp_batches=100, dtype='float32', gc_threshold=0.5, gc_type='none',
> > >     gpus=None, image_shape='1, 28, 28', initializer='default',
> > >     kv_store='device', load_epoch=None, loss='', lr=0.05, lr_factor=0.1,
> > >     lr_step_epochs='10', macrobatch_size=0, model_prefix=None, mom=0.9,
> > >     monitor=0, network='mlp', num_classes=10, num_epochs=20,
> > >     num_examples=60000, num_layers=None, optimizer='sgd',
> > >     profile_server_suffix='', profile_worker_suffix='', save_period=1,
> > >     test_io=0, top_k=0, warmup_epochs=5, warmup_strategy='linear',
> > > wd=0.0001)
> > >
> > >     INTEL OMP:
> > >
> > >     ldd libmxnet.so | grep omp
> > >             libomp.so =>
> > >     /home/chris/src/mxnet/cmake_omp/3rdparty/openmp/runtime/src/libomp.so
> > >     (0x00007f978fde7000)
> > >
> > >     :root:Epoch[0] Batch [0-100]        Speed: 31548.09 samples/sec
> > >     accuracy=0.780012
> > >     INFO:root:Epoch[0] Batch [100-200]      Speed: 16073.21 samples/sec
> > >     accuracy=0.920469
> > >     INFO:root:Epoch[0] Batch [200-300]      Speed: 19075.91 samples/sec
> > >     accuracy=0.928281
> > >     INFO:root:Epoch[0] Batch [300-400]      Speed: 23211.36 samples/sec
> > >     accuracy=0.942813
> > >     INFO:root:Epoch[0] Batch [400-500]      Speed: 22139.79 samples/sec
> > >     accuracy=0.938750
> > >     INFO:root:Epoch[0] Batch [500-600]      Speed: 23225.52 samples/sec
> > >     accuracy=0.946562
> > >     INFO:root:Epoch[0] Batch [600-700]      Speed: 19547.41 samples/sec
> > >     accuracy=0.953281
> > >     INFO:root:Epoch[0] Batch [700-800]      Speed: 24111.73 samples/sec
> > >     accuracy=0.951562
> > >     INFO:root:Epoch[0] Batch [800-900]      Speed: 13959.88 samples/sec
> > >     accuracy=0.957500
> > >     INFO:root:Epoch[0] Train-accuracy=0.925423
> > >     INFO:root:Epoch[0] Time cost=3.806
> > >     INFO:root:Epoch[0] Validation-accuracy=0.962580
> > >     INFO:root:Epoch[1] Batch [0-100]        Speed: 24560.21 samples/sec
> > >     accuracy=0.968131
> > >     INFO:root:Epoch[1] Batch [100-200]      Speed: 23457.03 samples/sec
> > >     accuracy=0.966250
> > >
> > >
> > >     LIBGOMP:
> > >
> > >     ldd libmxnet.so | grep omp
> > >             libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1
> > >     (0x00007f25c25dd000)
> > >
> > >     INFO:root:Epoch[0] Batch [0-100]        Speed: 1731.01 samples/sec
> > >      accuracy=0.782488
> > >     INFO:root:Epoch[0] Batch [100-200]      Speed: 3551.32 samples/sec
> > >      accuracy=0.907813
> > >     INFO:root:Epoch[0] Batch [200-300]      Speed: 1991.00 samples/sec
> > >      accuracy=0.927188
> > >     INFO:root:Epoch[0] Batch [300-400]      Speed: 2175.45 samples/sec
> > >      accuracy=0.937969
> > >     INFO:root:Epoch[0] Batch [400-500]      Speed: 1644.95 samples/sec
> > >      accuracy=0.942187
> > >     INFO:root:Epoch[0] Batch [500-600]      Speed: 6444.58 samples/sec
> > >      accuracy=0.950156
> > >     INFO:root:Epoch[0] Batch [600-700]      Speed: 7842.16 samples/sec
> > >      accuracy=0.947969
> > >     INFO:root:Epoch[0] Batch [700-800]      Speed: 9412.07 samples/sec
> > >      accuracy=0.953750
> > >     INFO:root:Epoch[0] Batch [800-900]      Speed: 12707.58 samples/sec
> > >     accuracy=0.953125
> > >
> > >     That being said, there's other issued beyond speed.  The DEFAULT
> > build
> > > from
> > >     makefile (not CMake) uses Intel OMP mkl (I showed before) and
> > > mysteriously
> > >     it has no issues?  This seems highly suspicious.  All I see is a lot
> > of
> > >     hand-waving and conjecture and pointing to StackOverflow posts made
> > by
> > >     people who may be of questionable pedigree to begin with.  This
> > smells
> > > of a
> > >     Pedro-ego-fight rather than one of purely technical merit.  Also, if
> > > one
> > >     knows how OMP works,  they would be very suspicious of the
> > > "intermittent
> > >     hangs" claim -- that's probably just broken race conditions elsewhere
> > > until
> > >     proven differently.  It'd tend freeze on the first use if something
> > is
> > >     wrong (try using libgomp after a fork and see), since worker threads"
> > >     wouldn't be assigned/joined properly.  IntelOMP is faster, but also
> > has
> > >     other advantages, such as allowing OMP after a fork.
> > >
> > >     I actually addressed a lot of issues and ask for clarification in the
> > >     original PR's way back when, but they're all just ignored.
> > >
> > >     -Chris
> > >
> > >
> > >
> > >
> >

Reply via email to