0 (non-binding) If we have a problem that blocks users, and a solution in
hand... then we should fix it, but not at the expense of starting the
release cycle again just for one fix. Users can cherry pick or build from
master if they want the fix right away, right? I'd change my mind to -1 if
this wasn't the case, with good reason, and if the user impact was critical
to adoption or risks abandonment.


On Wed, Sep 5, 2018 at 9:57 AM Roshani Nagmote <roshaninagmo...@gmail.com>
wrote:

> I believe everyone here is working hard to make MXNet a better framework
> for users. It's completely okay to have different opinions, we can decide
> together if this issue is a blocker or not after voting time is over.
>
> As I mentioned before, voting will end at 7 pm today. So there is still
> time to test the release. If there are any other issues anyone finds, I
> will be happy to start the process again and work on RC1. For now, I want
> to encourage everyone to utilize this time and vote. :)
>
> Thanks,
> Roshani
>
> On Tue, Sep 4, 2018 at 10:35 PM sandeep krishnamurthy <
> sandeep.krishn...@gmail.com> wrote:
>
> >    1. As a Apache MXNet community member, I raised the concern of broken
> >    functionality for the user. I explained and provided the data points
> on
> > the
> >    issue, workaround and why I think it is important. If after all this,
> > you
> >    think my vote is biased on my employer just because a user I quoted is
> > from
> >    Amazon, this is more concerning to me on my voting abilities.
> >    2. My -1 no where undermines the huge amount of effort that goes
> behind
> >    the scene for a release to happen. Great respect and recognition for
> >    everyone involved in all the releases of MXNet in the past and this. I
> >    voted on my judgement of what may be good for the users of MXNet.
> >    3. As pointed by Naveen & Chris, -1 are NOT veto. Feel free to decide
> >    and progress on the release as we already have >3 +1 in this thread.
> >
> >
> > Best,
> >
> > Sandeep
> >
> > On Tue, Sep 4, 2018 at 8:29 PM Chris Olivier <cjolivie...@gmail.com>
> > wrote:
> >
> > > btw, there are no vetoes on package releases:
> > >
> > > VOTES ON PACKAGE RELEASES
> > > <https://www.apache.org/foundation/voting.html#ReleaseVotes>
> > >
> > > Votes on whether a package is ready to be released use majority
> approval
> > > <https://www.apache.org/foundation/glossary.html#MajorityApproval> --
> > i.e.
> > > at least three PMC members must vote affirmatively for release, and
> there
> > > must be more positive than negative votes.Releases may not be vetoed.
> > > Generally
> > > the community will cancel the release vote if anyone identifies serious
> > > problems, but in most cases the ultimate decision, lies with the
> > individual
> > > serving as release manager. The specifics of the process may vary from
> > > project to project, but the 'minimum quorum of three +1 votes' rule is
> > > universal.
> > >
> > > On Tue, Sep 4, 2018 at 7:12 PM Sheng Zha <szha....@gmail.com> wrote:
> > >
> > > > Thanks for sharing your opinions, Thomas. Your recognition and
> respect
> > of
> > > > people's efforts on preparing the release candidate are certainly
> > > > appreciated.
> > > >
> > > > Now that the vote is set to fail thanks to the veto, there will be
> > plenty
> > > > of opportunities to include those bug fixes, including the one Zhi
> > > > mentioned [1], which was already merged in the master and yet chose
> not
> > > to
> > > > block this release with [2]. I will be happy to work with Roshani to
> > > > prepare another release candidate once ready.
> > > >
> > > > -sz
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/f02e952bec22c82cb00a6741390a78f55373311c97464997bb455a6c@%3Cdev.mxnet.apache.org%3E
> > > > [2]
> > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/85d3fcabb3437ba7f1af455cf69aa13eb3afd1ea1d1f6f891e9c339c@%3Cdev.mxnet.apache.org%3E
> > > >
> > > > On Tue, Sep 4, 2018 at 6:02 PM Thomas DELTEIL <
> > thomas.delte...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > -0
> > > > > (non-binding)
> > > > >
> > > > > If I may add some nuancing plus a personal data point as one of the
> > > users
> > > > > commenting in the bug report in question:
> > > > >
> > > > > - Performance vs. Basic functionality => I don't think high
> > performance
> > > > > use-cases and basic functionality are two obviously opposed
> concepts
> > > and
> > > > > see no contradiction in Hagay's and Sandeep's statements.
> > > > > Float16 support is feature of MXNet that provides more than twice
> the
> > > > > performance of Float32 on supported platforms, hence the high
> > > performance
> > > > > use-case. The bug is that the basic functionality of reloading a
> > saved
> > > > > float16 models is currently broken.
> > > > >
> > > > > - This bug vs Other bugs => Contrary the vast majority of the 140
> > open
> > > > bugs
> > > > > that are mentioned above, I would put to Sandeep's credit that this
> > one
> > > > bug
> > > > > has a PR open that provides a fix for it. This would make it a
> better
> > > > > candidate to get included in this release than a bug that has no
> fix
> > > > ready
> > > > > for it.
> > > > >
> > > > > - Personal datapoint: I recently did some experimentation with
> > float16
> > > > [1]
> > > > > and actually coincidentally just published a video on optimizing
> > > > > performance for Gluon. Float16 conversion is one of the most, if
> not
> > > the
> > > > > most effective way to get performance out of MXNet [2]. I believe
> > there
> > > > is
> > > > > a lot of value in publicizing more its use and hence making sure at
> > > least
> > > > > the basic support for normal use-cases is present.
> > > > >
> > > > > Of course this needs to be balanced with the overhead of preparing
> a
> > > new
> > > > > release candidate once the fixed is reviewed and merged, which
> seems
> > to
> > > > be
> > > > > a lengthy and complex process in its own right, and the delay with
> > > > > providing the other features present in 1.3 for users that are not
> > > > running
> > > > > off the nightly builds.
> > > > >
> > > > > All the best,
> > > > >
> > > > > Thomas
> > > > >
> > > > > [1] https://github.com/ThomasDelteil/PerformanceTricksMXNetGluon
> > > > > [2]
> > > > >
> > > > >
> > > >
> > >
> >
> https://www.youtube.com/watch?v=Cqo7FPftNyo&t=0s&list=PLkEvNnRk8uVk6U515Pj-jHQUxFC4eDi3m
> > > > >
> > > > > Le mar. 4 sept. 2018 à 17:11, Sheng Zha <szha....@gmail.com> a
> > écrit :
> > > > >
> > > > > > Sandeep,
> > > > > >
> > > > > > Thanks for explaining your veto. We have open bugs that impacted
> a
> > > lot
> > > > > more
> > > > > > than just 3 customers, just by referring to the number of
> > commenters
> > > on
> > > > > the
> > > > > > issue [1].
> > > > > >
> > > > > > You said that this is for "high performance use cases", which
> > > > contradicts
> > > > > > with Hagay's assement that this is "basic functionality broken".
> > > Given
> > > > > that
> > > > > > this is for advanced use cases of using half-precision training,
> > why
> > > is
> > > > > it
> > > > > > so much more important than any other open bug reports, that for
> > this
> > > > > > specific bug fix, we have to delay the access of regular users to
> > the
> > > > new
> > > > > > MXNet 1.3 release by at least another week?
> > > > > >
> > > > > > Honestly, I'm concerned that your vote is biased by Amazon
> > > involvement,
> > > > > > given that you quoted Amazon Rekognition.
> > > > > >
> > > > > > -sz
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/incubator-mxnet/issues?q=is%3Aissue+is%3Aopen+label%3ABug+sort%3Acomments-desc
> > > > > >
> > > > > > On Tue, Sep 4, 2018 at 4:51 PM sandeep krishnamurthy <
> > > > > > sandeep.krishn...@gmail.com> wrote:
> > > > > >
> > > > > > > My initial vote of “-0” was due to lack of info from a user who
> > had
> > > > > said,
> > > > > > > he overcame this issue for FP16 model.
> > > > > > >
> > > > > > >
> > > > > > > However, suggested workaround [1] for the issue is not straight
> > > > forward
> > > > > > and
> > > > > > > generally usable for all users. Also, issue is not simple and
> > > > isolated
> > > > > to
> > > > > > > be listed in the Release Notes as known issue with a
> workaround.
> > > > > > >
> > > > > > >
> > > > > > > Changing my vote to: "-1 (binding)" owing to the user impact
> [3]
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > @Sheng:
> > > > > > >
> > > > > > > 1. Agreed, bug existed from long time. However, FP16 and such
> > > > > > optimizations
> > > > > > > were added later on. Followed by users [2] using this feature
> for
> > > > high
> > > > > > > performance use cases. It is not ok to measure severity of the
> > bug
> > > > > based
> > > > > > on
> > > > > > > its past existence, rather we can see who is impacted now and
> is
> > > it a
> > > > > > small
> > > > > > > subset with a simple workaround or large user impacting issue.
> > > > > > >
> > > > > > > 2. Agreed bug was reported 7/21. However, I became aware of
> this
> > > > issue
> > > > > on
> > > > > > > 08/29 and submitted the fix on 08/30. Also, I did bring this to
> > the
> > > > > > notice
> > > > > > > of community, you and 1.3 release manager (Roshani) on the RC0
> > > > proposal
> > > > > > > thread. Also, I would focus on the issue and user impact than
> who
> > > > > > > identified and who is fixing the issue.
> > > > > > >
> > > > > > >
> > > > > > > Based on my discussion with 2 users, I think it is a important
> > > > feature
> > > > > > for
> > > > > > > them to see in Apache MXNet v1.3.0.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Best,
> > > > > > >
> > > > > > > Sandeep
> > > > > > >
> > > > > > >
> > > > > > > [1] Workaround used by the user.
> > > > > > >
> > > > > > >
> > > > > > > net_fp16 =
> > > mx.gluon.SymbolBlock.imports('resnet34_fp16-symbol.json',
> > > > > > > ['data'])
> > > > > > >
> > > > > > > params_fp16 = mx.nd.load('resnet34_fp16-0000.params')
> > > > > > >
> > > > > > >
> > > > > > > for k, v in params_fp16.items():
> > > > > > >
> > > > > > >     new_key = k.split(':')[1]
> > > > > > >
> > > > > > >     net_fp16.collect_params()[new_key].cast(v.dtype)
> > > > > > >
> > > > > > >
> > > > > > > net_fp16.collect_params().load('resnet34_fp16-0000.params',
> ctx)
> > > > > > >
> > > > > > >
> > > > > > > [2] Amazon Rekognition
> > > > > > >
> > > > > > >
> > > > > > > [3] User story: Train a model -> Cast it to FP16 -> Save the
> > model
> > > ->
> > > > > > Load
> > > > > > > back the model does not work. They have to cast every parameter
> > > with
> > > > a
> > > > > > > workaround mentioned above [1].
> > > > > > >
> > > > > > > On Tue, Sep 4, 2018 at 4:14 PM Hagay Lupesko <
> lupe...@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi Sheng,
> > > > > > > >
> > > > > > > > Addressing your questions:
> > > > > > > >
> > > > > > > > - "why this specific bug is more important than all the other
> > > known
> > > > > > bugs,
> > > > > > > > that this becomes a release blocker"
> > > > > > > > I do not consider it to be more or less important than other
> > > fixes.
> > > > > It
> > > > > > > can
> > > > > > > > be fixed and included in the release alongside the rest of
> the
> > > > > release
> > > > > > > > content, right?
> > > > > > > > From the description of the issue it seems important since it
> > is
> > > > > > blocking
> > > > > > > > users from loading models that were previously trained and
> > saved.
> > > > > There
> > > > > > > is
> > > > > > > > nothing stopping the community from including this fix into
> > > 1.3.0,
> > > > > > > > alongside the rest of the features and fixes.
> > > > > > > >
> > > > > > > > - "The bug exists since SymbolBlock was introduced a year ago
> > and
> > > > has
> > > > > > > > survived at least three releases, so this is not a
> regression."
> > > > > > > > I do not think I said it is a regression. However, the fact a
> > bug
> > > > > > existed
> > > > > > > > before, does not mean it is OK to release it rather than fix
> > it.
> > > > > > > >
> > > > > > > > - "Timeline-wise, this bug was reported on 7/21, but was not
> > > > reported
> > > > > > as
> > > > > > > > release-blocker in the release discussion thread until 8/31
> > [1].
> > > > > > Neither
> > > > > > > > its reporting as release-blocker nor its fix made it for the
> > 8/3
> > > > code
> > > > > > > > freeze."
> > > > > > > > You are right, would have been better to have this identified
> > and
> > > > > fixed
> > > > > > > > earlier and included before code freeze.
> > > > > > > >
> > > > > > > > - "The PR is still not ready yet as it doesn't have
> approval."
> > > > > > > > I think it is waiting for your review.
> > > > > > > >
> > > > > > > > - "it would be great if you could provide some additional
> > > reasoning
> > > > > > > besides
> > > > > > > > "X mentions the issue" or "fix was done by X""
> > > > > > > > I have. Repeating what I wrote in my previous email for
> > clarity:
> > > > > Basic
> > > > > > > > functionality broken: loading a model (albeit one that that
> was
> > > > saved
> > > > > > as
> > > > > > > > non FP32)
> > > > > > > >
> > > > > > > > So, yes - this issue seems to have been out there for a
> while,
> > > > > somehow
> > > > > > > went
> > > > > > > > under the radar... but I think the key question is whether
> this
> > > > > blocks
> > > > > > a
> > > > > > > > basic functionality in MXNet. I believe so, hence my -1 vote.
> > > > > > > >
> > > > > > > > Hagay
> > > > > > > >
> > > > > > > > On Tue, Sep 4, 2018 at 1:19 PM Sheng Zha <szha....@gmail.com
> >
> > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Hagay and Sandeep,
> > > > > > > > >
> > > > > > > > > Could you help us understand why this specific bug is more
> > > > > important
> > > > > > > than
> > > > > > > > > all the other known bugs, that this becomes a release
> > blocker?
> > > > > > > > >
> > > > > > > > > Some facts to consider:
> > > > > > > > > - The bug exists since SymbolBlock was introduced a year
> ago
> > > and
> > > > > has
> > > > > > > > > survived at least three releases, so this is not a
> > regression.
> > > > > > > > > - Timeline-wise, this bug was reported on 7/21, but was not
> > > > > reported
> > > > > > as
> > > > > > > > > release-blocker in the release discussion thread until 8/31
> > > [1].
> > > > > > > Neither
> > > > > > > > > its reporting as release-blocker nor its fix made it for
> the
> > > 8/3
> > > > > code
> > > > > > > > > freeze.
> > > > > > > > > - The PR is still not ready yet as it doesn't have
> approval.
> > > > > > > > >
> > > > > > > > > Hagay, it would be great if you could provide some
> additional
> > > > > > reasoning
> > > > > > > > > besides "X mentions the issue" or "fix was done by X".
> > Thanks.
> > > > > > > > >
> > > > > > > > > -sz
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/d1ed611f98c20d5d85c294b0c07c8bdebca13a209cf66a3872c9123e@%3Cdev.mxnet.apache.org%3E
> > > > > > > > >
> > > > > > > > > On Tue, Sep 4, 2018 at 12:39 PM Hagay Lupesko <
> > > lupe...@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Sandeep mentions the issue of an error when user tries to
> > > load
> > > > > > model
> > > > > > > > > params
> > > > > > > > > > trained/saved as FP16.
> > > > > > > > > > https://github.com/apache/incubator-mxnet/issues/11849
> > > > > > > > > > The fix was done by Sandeep:
> > > > > > > > > > https://github.com/apache/incubator-mxnet/pull/12412 and
> > is
> > > > > ready
> > > > > > to
> > > > > > > > be
> > > > > > > > > > cherry picked into the release branch.
> > > > > > > > > >
> > > > > > > > > > This seems like a release blocker to me:
> > > > > > > > > > - Basic functionality broken: loading a model (albeit one
> > > that
> > > > > that
> > > > > > > was
> > > > > > > > > > saved as non FP32)
> > > > > > > > > > - Reported by 3 users (wgchang@, nicklhy@ and
> > ThomasDelteil@
> > > )
> > > > > > > > > >
> > > > > > > > > > -1 (non binding)
> > > > > > > > > >
> > > > > > > > > > Hagay
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Tue, Sep 4, 2018 at 12:01 PM sandeep krishnamurthy <
> > > > > > > > > > sandeep.krishn...@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > > "- 0"
> > > > > > > > > > >
> > > > > > > > > > > I believe the bug #11849
> > > > > > > > > > > <
> https://github.com/apache/incubator-mxnet/issues/11849
> > >,
> > > > > unable
> > > > > > > to
> > > > > > > > > > import
> > > > > > > > > > > non-fp32 models into Gluon, fixed in this PR #12412
> > > > > > > > > > > <https://github.com/apache/incubator-mxnet/pull/12412>
> > is
> > > > > > > important
> > > > > > > > > for
> > > > > > > > > > > the
> > > > > > > > > > > users. I would rather pick this fix in this release
> than
> > > > plan a
> > > > > > > minor
> > > > > > > > > > > release later.
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Sandeep
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Sep 3, 2018 at 2:34 PM Philip Cho <
> > > > > > > > chohy...@cs.washington.edu>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Actually, the command "git clone --recursive
> > > > > > > > > > > > https://github.com/apache/incubator-mxnet -b
> > 1.3.0.rc0"
> > > > > works
> > > > > > > fine
> > > > > > > > > > now,
> > > > > > > > > > > > never mind.
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Sep 3, 2018 at 1:45 PM Philip Cho <
> > > > > > > > > chohy...@cs.washington.edu>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Unfortunately, MXNet was depending on a branch of
> TVM
> > > > that
> > > > > is
> > > > > > > now
> > > > > > > > > > > > deleted.
> > > > > > > > > > > > > We will have to merge #12448
> > > > > > > > > > > > > <
> > https://github.com/apache/incubator-mxnet/pull/12448>
> > > > > > before
> > > > > > > > the
> > > > > > > > > > > > release.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Background: See dmlc/tvm#1394 <
> > > > > > > > > > https://github.com/dmlc/tvm/issues/1394
> > > > > > > > > > > >.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Philip.
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Sep 3, 2018 at 7:26 AM Carin Meier <
> > > > > > > carinme...@gmail.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > >> Checked out the tag, built and tested the Clojure
> > > > package.
> > > > > > +1
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> On Fri, Aug 31, 2018 at 10:59 PM Roshani Nagmote <
> > > > > > > > > > > > >> roshaninagmo...@gmail.com>
> > > > > > > > > > > > >> wrote:
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> > Hi all,
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > I would like to propose a vote to release Apache
> > > MXNet
> > > > > > > > > > (incubating)
> > > > > > > > > > > > >> version
> > > > > > > > > > > > >> > 1.3.0.RC0. Voting will start now (Friday, Aug
> > 31st)
> > > > and
> > > > > > end
> > > > > > > at
> > > > > > > > > > 7:00
> > > > > > > > > > > PM
> > > > > > > > > > > > >> > PDT, Wednesday, Sept 5th.
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > Link to release notes:
> > > > > > > > > > > > >> >
> > https://github.com/apache/incubator-mxnet/releases
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > Link to release candidate 1.3.0.rc0:
> > > > > > > > > > > > >> > *
> > > > > > > > >
> > > https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
> > > > > > > > > > > > >> > <
> > > > > > > > >
> > > https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0
> > > > > > > > > > > >0*
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > View this page, click on "Build from Source",
> and
> > > use
> > > > > the
> > > > > > > > source
> > > > > > > > > > > code
> > > > > > > > > > > > >> > obtained from 1.3.0.rc0 tag:
> > > > > > > > > > > > >> >
> > > https://mxnet.incubator.apache.org/install/index.html
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > Please remember to TEST first before voting
> > > > accordingly:
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > +1 = approve
> > > > > > > > > > > > >> > +0 = no opinion
> > > > > > > > > > > > >> > -1 = disapprove (provide reason)
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > Thanks,
> > > > > > > > > > > > >> > Roshani
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >>
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Sandeep Krishnamurthy
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Sandeep Krishnamurthy
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > Sandeep Krishnamurthy
> >
>

Reply via email to