+1 On Tue., 18 Jun. 2019, 12:20 pm Anton Chernov, <[email protected]> wrote:
> I would like to cite Apache Glossary [1]: > > Veto > According to the Apache methodology, a change which has been made or > proposed may be made moot through the exercise of a veto by a committer to > the codebase in question. If the R-T-C commit policy is in effect, a veto > prevents the change from being made. In either the R-T-C or C-T-R > environments, a veto applied to a change that has already been made forces > it to be reverted. Vetos may not be overridden nor voted down, and only > cease to apply when the committer who issued the veto withdraws it. All > vetos must be accompanied by a valid technical justification; a veto > without such a justification is invalid; in case of doubt, deciding whether > a technical justification is valid is up to the PMC. Vetos force discussion > and, if supported, version control rollback or appropriate code changes. > Vetoed code commits are best reverted by the original committer, unless an > urgent solution is needed (e.g., build breakers). Vetos only apply to code > changes; they do not apply to procedural issues such as software releases. > > Therefore the vote opened by Pedro has it's right to exist as a voting of > PMC members on whether the veto of Chris has technical justification. It > would be great if the committee could cast a vote on this to clarify the > situation. > > Best > Anton > > [1] https://www.apache.org/foundation/glossary.html > > > On Mon, 17 Jun 2019 at 22:01, Pedro Larroy <[email protected]> > wrote: > > > Regarding your paste of ldd, not sure what's your point. This is the > > output with the patch from PR applied, the openmp version used is the > > one provided by MKL: > > > > > > Version of the PR that removes openmp from 3rdparty comes from MKL: > > > > linux-vdso.so.1 (0x00007ffc4c993000) > > libmkldnn.so.0 => > > /home/piotr/mxnet_openmp/build/3rdparty/mkldnn/src/libmkldnn.so.0 > > (0x00007f7874519000) > > libopenblas.so.0 => /usr/lib/x86_64-linux-gnu/libopenblas.so.0 > > (0x00007f7872273000) > > librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 > (0x00007f787206b000) > > libjemalloc.so.1 => /usr/lib/x86_64-linux-gnu/libjemalloc.so.1 > > (0x00007f7871e35000) > > libmklml_intel.so => > > > > > /home/piotr/mxnet_openmp/build/mklml/mklml_lnx_2019.0.5.20190502/lib/libmklml_intel.so > > (0x00007f786a3a3000) > > libiomp5.so => > > > > > /home/piotr/mxnet_openmp/build/mklml/mklml_lnx_2019.0.5.20190502/lib/libiomp5.so > > (0x00007f7869fae000) > > > > > > > > In master, ldd shows that openmp is coming from 3rdparty: > > > > libomp.so => > > /home/piotr/mxnet_master/build/3rdparty/openmp/runtime/src/libomp.so > > (0x00007f1ea2353000) > > > > > > > > Could you please explain your argument on what's the recommended way > > and what's the divergence? As explained in the PR there seems to be > > no performance advantage on adding our own version of openmp. > > > > Thanks. > > > > On Mon, Jun 17, 2019 at 12:55 PM Pedro Larroy > > <[email protected]> wrote: > > > > > > Thanks for your answer Chris. I'm not militant in regards to one > > > option or another nor belong of any club that will accept me as > > > member, but I think we should drive this to conclusion and understand > > > why we keep it or if we should remove it and use the platform provided > > > version. There are open issues linked on that PR that are causing > > > problems, that remain unaddressed it has been said that linking with > > > two openmp versions causes undefined behaviour, I had the experience > > > of having MXNet hang when using OpenMP running the tests in plain > > > ubuntu CPU, I know we are reentering OpenMP initialization when > > > creating threads in the engine and getting an assertion, etc, so I > > > have serious concerns when running MXNet with OpenMP enabled in > > > production. > > > > > > I think you are the expert in OpenMP and HPC and it would be good to > > > have a documented and explainable outcome of one option or another > > > either in the mailing list or in the wiki. > > > > > > I also feel bad for contributors spending time measuring and > > > benchmarking without conclusion, I feel they have given up working on > > > this issue since the PR keeps getting closed and is not moved forward. > > > > > > Please help us understand whats the best way forward with this issue > > > and not just close the PR without explanations. > > > > > > Pedro. > > > > > > On Mon, Jun 17, 2019 at 11:15 AM Chris Olivier <[email protected]> > > wrote: > > > > > > > > I am curious why you're being so militant troll about this. libomp > is > > used > > > > in every MKL build (download mxnet-mkl yourself and see). I don't > see > > any > > > > convincing reason to change it and so far as I can tell, no real > issue > > has > > > > been proven to be related. Anyway, I am reluctant to feed trolls any > > more > > > > than this, so I don't really have much else to add. > > > > > > > > ldd /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so > > > > linux-vdso.so.1 (0x00007ffc989cf000) > > > > libmklml_intel.so => > > > > /usr/local/lib/python3.6/dist-packages/mxnet/libmklml_intel.so > > > > (0x00007f0afb7c1000) > > > > * libiomp5.so => > > > > /usr/local/lib/python3.6/dist-packages/mxnet/libiomp5.so > > > > (0x00007f0afb3e5000)* > > > > librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 > > (0x00007f0afb1dd000) > > > > libmkldnn.so.0 => > > > > /usr/local/lib/python3.6/dist-packages/mxnet/libmkldnn.so.0 > > > > (0x00007f0afa7ba000) > > > > libgfortran.so.3 => > > > > /usr/local/lib/python3.6/dist-packages/mxnet/libgfortran.so.3 > > > > (0x00007f0afa493000) > > > > libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 > > (0x00007f0afa28f000) > > > > libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > > > > (0x00007f0af9f06000) > > > > libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 > > (0x00007f0af9b68000) > > > > libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 > > > > (0x00007f0af9950000) > > > > libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 > > > > (0x00007f0af9731000) > > > > libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 > > (0x00007f0af9340000) > > > > /lib64/ld-linux-x86-64.so.2 (0x00007f0b073f4000) > > > > libquadmath.so.0 => > > > > /usr/local/lib/python3.6/dist-packages/mxnet/libquadmath.so.0 > > > > (0x00007f0af9100000) > > > > > > > > > > > > On Mon, Jun 17, 2019 at 10:58 AM Pedro Larroy < > > [email protected]> > > > > wrote: > > > > > > > > > I had read the "Apache Voting Process" guide here: > > > > > https://www.apache.org/foundation/voting.html and I thought code > > > > > changes could be discussed on the mailing list in cases where the > PR > > > > > is stuck or there's no response for a long time, also I understood > > > > > that -1's have to be justified. Could you, or someone more > familiar > > > > > which the Apache way enlighten on how to move this issue forward > in a > > > > > constructive way? > > > > > > > > > > Thanks a lot. > > > > > > > > > > Pedro. > > > > > > > > > > On Mon, Jun 17, 2019 at 10:46 AM Pedro Larroy > > > > > <[email protected]> wrote: > > > > > > > > > > > > Thanks. > > > > > > > > > > > > How do we go on advancing this PR then? all the questions have > been > > > > > > answered, performance numbers provided and more. Until how long > > can a > > > > > > veto stand? Also without replies to contributors. > > > > > > > > > > > > Pedro. > > > > > > > > > > > > On Fri, Jun 14, 2019 at 5:44 PM Sheng Zha <[email protected]> > > wrote: > > > > > > > > > > > > > > This vote is invalid as the original PR has been vetoed by a > > > > > committer. A vote on dev@ won't help you circumvent a veto. > > > > > > > > > > > > > > -sz > > > > > > > > > > > > > > On 2019/06/14 23:59:33, Pedro Larroy < > > [email protected]> > > > > > wrote: > > > > > > > > Hi all > > > > > > > > > > > > > > > > This is a 5-day vote to act and wrap up an outstanding PR > that > > > > > removes > > > > > > > > linkage with multiple OpenMP from 3rdparty and uses the > system > > > > > > > > provided one which might resolve a number of difficult to > debug > > > > > issues > > > > > > > > and possible undefined behaviour. > > > > > > > > > > > > > > > > https://github.com/apache/incubator-mxnet/pull/12160 > > > > > > > > > > > > > > > > See the comments in the thread for more details but in short, > > linking > > > > > > > > with multiple openmp versions seems to lead to undefined > > behaviour, > > > > > > > > plus it would simplify not having to deal with a custom > openmp > > > > > version > > > > > > > > and rely on the platform provided one. > > > > > > > > > > > > > > > > This is expected to simplify builds and address a number of > > problems. > > > > > > > > Seems it doesn't cause any performance degradation, (the > Gluon > > tests > > > > > > > > run almost 4x faster in my 64 core machine). > > > > > > > > > > > > > > > > There has been in depth study of performance implications by > > > > > > > > contributors like Stanislav Tsukrov and Anton Chernov. All > the > > > > > > > > concerns and comments from the reviewers have been addressed > > and we > > > > > > > > can't keep asking open ended questions to block PRs. > Reviewers > > are > > > > > > > > expected to be proactive and responsive to contributors so we > > keep > > > > > > > > encouraging active contributors. > > > > > > > > > > > > > > > > please vote to merge this PR accordingly: > > > > > > > > > > > > > > > > +1 = approve > > > > > > > > +0 = no opinion > > > > > > > > -1 = disapprove (provide reason) > > > > > > > > > > > > > > > > If we observe regressions reported by any internal > performance > > > > > systems > > > > > > > > or by contributors the PR can be reverted easily. So it's not > > a one > > > > > > > > way door. But it will be useful to try this in master for a > > while. > > > > > > > > > > > > > > > > Thank you. > > > > > > > > > > > > > > > > Pedro. > > > > > > > > > > > > > > > >
