Hi Hagay We are aware of this and we are working in this direction which as you point out, is more desirable. There's a huge amount of non-trivial work that has gone into building these distribution packages from Sheng which needs to be adapted for our CI system, and taken into consideration.
Pedro. On Mon, Jul 30, 2018 at 9:07 AM Hagay Lupesko <lupe...@gmail.com> wrote: > Thanks Tong for root-causing the issue! > Thanks Sheng for following up with an updated PyPi package. > > What worries me is that we seem to build MXNet PyPi distribution packages > with a build config different than the CI where all of the tests are > running. > Looking here [1 > < > https://github.com/apache/incubator-mxnet/blob/master/ci/docker/install/ubuntu_core.sh > >] > it seems that MXNet CI Ubuntu build uses libopenblas-dev v0.2.18, while > PyPi build for MXNet 1.2.1 used v0.3.2 (I would imaging PyPi distribution?) > > Needless to say that if we don't make sure PyPi distribution is aligned > with the CI build, similar issues can happen again with other dependencies. > I'd think we want the build configs to be the same, or better yet have the > PyPi package be built from the output produced by the CI. > Thoughts? > > [1] > > https://github.com/apache/incubator-mxnet/blob/master/ci/docker/install/ubuntu_core.sh > > > On Fri, Jul 27, 2018 at 11:31 AM Sheng Zha <szha....@gmail.com> wrote: > > > Tong, > > > > That's great news. I'm glad that OpenBLAS people are responding so > quickly. > > In that case it's probably a better idea to use that version instead. The > > latest OpenBLAS version brings many optimization for all kinds of > hardware. > > > > -sz > > > > On Fri, Jul 27, 2018 at 11:10 AM, Tong He <hetong...@gmail.com> wrote: > > > > > Hi Sheng, > > > > > > I also opened an issue on OpenBLAS repo: > > > https://github.com/xianyi/OpenBLAS/issues/1700 . > > > > > > As informed that "0.3.2 should be released this weekend", I tested > their > > > develope branch as well, and seems the new version has fixed the bug. > > > > > > Since OpenBLAS 0.3.2 could also have performance improvement, > therefore I > > > propose to wait for OpenBLAS 0.3.2 for our pip post release. > > > > > > > > > Best regards, > > > > > > Tong He > > > > > > 2018-07-27 10:54 GMT-07:00 Sheng Zha <szha....@gmail.com>: > > > > > > > Forgot to mention, the post release version is a pip package version. > > > > > > > > -sz > > > > > > > > > On Jul 27, 2018, at 10:42 AM, Sheng Zha <szha....@gmail.com> > wrote: > > > > > > > > > > In this case we can regard it as a release problem, which is > usually > > > > what post release versions are for. It’s still the same release with > > > > different dependency, so there is no code change needed. > > > > > > > > > > -sz > > > > > > > > > > > > > > >> On Jul 27, 2018, at 8:31 AM, Steffen Rochel < > > steffenroc...@gmail.com> > > > > wrote: > > > > >> > > > > >> Hi Tong - thanks for root causing the problem. > > > > >> Sheng - what is 1.2.1.post0? Shouldn't a patch with fix be > released > > as > > > > >> 1.2.2? > > > > >> Steffen > > > > >> > > > > >>> On Thu, Jul 26, 2018 at 5:33 PM Sheng Zha <szha....@gmail.com> > > > wrote: > > > > >>> > > > > >>> Dear users and developers of Apache MXNet (Incubating), > > > > >>> > > > > >>> Thanks to Tong's dedication, the root cause for this issue was > > > > identified > > > > >>> to be instability in OpenBLAS's latest stable version 0.3.1. For > > > > details, > > > > >>> see Tong's comment > > > > >>> < > > > > >>> https://github.com/apache/incubator-mxnet/issues/11853# > > > > issuecomment-408272772 > > > > >>>> > > > > >>> . > > > > >>> > > > > >>> Since both the nightly build and the 1.2.1 wheels are affected, > we > > > > >>> recommend that we stay on OpenBLAS last known stable version > 0.2.20 > > > > that > > > > >>> we've been using. I will assume lazy consensus and prepare the > fix > > > > >>> (1.2.1.post0). > > > > >>> > > > > >>> -sz > > > > >>> > > > > >>>> On Tue, Jul 24, 2018 at 3:35 PM, Tong He <t...@apache.org> > wrote: > > > > >>>> > > > > >>>> Recently there's an issue regarding the inconsistent result from > > > gluon > > > > >>>> forward: > > > > >>>> > > > > >>>> https://github.com/apache/incubator-mxnet/issues/11853 > > > > >>>> > > > > >>>> Given a constant input image and loaded pretrained parameters, > we > > > > expect > > > > >>> a > > > > >>>> deterministic output from arbitrary repeats of forwards. However > > > from > > > > the > > > > >>>> issue I see that the forwarded result is non-determinstic. It is > > > > harmful > > > > >>> as > > > > >>>> it makes the results from experments/benchmarks/inference > > > > meaningless. > > > > >>>> > > > > >>>> Therefore I propose to block the 1.3 release before it gets > > > resolved. > > > > >>>> > > > > >>> > > > > > > > > > >