"Releases may not be vetoed" http://www.apache.org/legal/release-policy.html#release-approval
I haven't tested the release yet, I'll do so tomorrow. > On Sep 4, 2018, at 7:13 PM, Sheng Zha <szha....@gmail.com> wrote: > > Thanks for sharing your opinions, Thomas. Your recognition and respect of > people's efforts on preparing the release candidate are certainly > appreciated. > > Now that the vote is set to fail thanks to the veto, there will be plenty > of opportunities to include those bug fixes, including the one Zhi > mentioned [1], which was already merged in the master and yet chose not to > block this release with [2]. I will be happy to work with Roshani to > prepare another release candidate once ready. > > -sz > > [1] > https://lists.apache.org/thread.html/f02e952bec22c82cb00a6741390a78f55373311c97464997bb455a6c@%3Cdev.mxnet.apache.org%3E > [2] > https://lists.apache.org/thread.html/85d3fcabb3437ba7f1af455cf69aa13eb3afd1ea1d1f6f891e9c339c@%3Cdev.mxnet.apache.org%3E > > On Tue, Sep 4, 2018 at 6:02 PM Thomas DELTEIL <thomas.delte...@gmail.com> > wrote: > >> -0 >> (non-binding) >> >> If I may add some nuancing plus a personal data point as one of the users >> commenting in the bug report in question: >> >> - Performance vs. Basic functionality => I don't think high performance >> use-cases and basic functionality are two obviously opposed concepts and >> see no contradiction in Hagay's and Sandeep's statements. >> Float16 support is feature of MXNet that provides more than twice the >> performance of Float32 on supported platforms, hence the high performance >> use-case. The bug is that the basic functionality of reloading a saved >> float16 models is currently broken. >> >> - This bug vs Other bugs => Contrary the vast majority of the 140 open bugs >> that are mentioned above, I would put to Sandeep's credit that this one bug >> has a PR open that provides a fix for it. This would make it a better >> candidate to get included in this release than a bug that has no fix ready >> for it. >> >> - Personal datapoint: I recently did some experimentation with float16 [1] >> and actually coincidentally just published a video on optimizing >> performance for Gluon. Float16 conversion is one of the most, if not the >> most effective way to get performance out of MXNet [2]. I believe there is >> a lot of value in publicizing more its use and hence making sure at least >> the basic support for normal use-cases is present. >> >> Of course this needs to be balanced with the overhead of preparing a new >> release candidate once the fixed is reviewed and merged, which seems to be >> a lengthy and complex process in its own right, and the delay with >> providing the other features present in 1.3 for users that are not running >> off the nightly builds. >> >> All the best, >> >> Thomas >> >> [1] https://github.com/ThomasDelteil/PerformanceTricksMXNetGluon >> [2] >> >> https://www.youtube.com/watch?v=Cqo7FPftNyo&t=0s&list=PLkEvNnRk8uVk6U515Pj-jHQUxFC4eDi3m >> >>> Le mar. 4 sept. 2018 à 17:11, Sheng Zha <szha....@gmail.com> a écrit : >>> >>> Sandeep, >>> >>> Thanks for explaining your veto. We have open bugs that impacted a lot >> more >>> than just 3 customers, just by referring to the number of commenters on >> the >>> issue [1]. >>> >>> You said that this is for "high performance use cases", which contradicts >>> with Hagay's assement that this is "basic functionality broken". Given >> that >>> this is for advanced use cases of using half-precision training, why is >> it >>> so much more important than any other open bug reports, that for this >>> specific bug fix, we have to delay the access of regular users to the new >>> MXNet 1.3 release by at least another week? >>> >>> Honestly, I'm concerned that your vote is biased by Amazon involvement, >>> given that you quoted Amazon Rekognition. >>> >>> -sz >>> >>> [1] >>> >>> >> https://github.com/apache/incubator-mxnet/issues?q=is%3Aissue+is%3Aopen+label%3ABug+sort%3Acomments-desc >>> >>> On Tue, Sep 4, 2018 at 4:51 PM sandeep krishnamurthy < >>> sandeep.krishn...@gmail.com> wrote: >>> >>>> My initial vote of “-0” was due to lack of info from a user who had >> said, >>>> he overcame this issue for FP16 model. >>>> >>>> >>>> However, suggested workaround [1] for the issue is not straight forward >>> and >>>> generally usable for all users. Also, issue is not simple and isolated >> to >>>> be listed in the Release Notes as known issue with a workaround. >>>> >>>> >>>> Changing my vote to: "-1 (binding)" owing to the user impact [3] >>>> >>>> >>>> >>>> @Sheng: >>>> >>>> 1. Agreed, bug existed from long time. However, FP16 and such >>> optimizations >>>> were added later on. Followed by users [2] using this feature for high >>>> performance use cases. It is not ok to measure severity of the bug >> based >>> on >>>> its past existence, rather we can see who is impacted now and is it a >>> small >>>> subset with a simple workaround or large user impacting issue. >>>> >>>> 2. Agreed bug was reported 7/21. However, I became aware of this issue >> on >>>> 08/29 and submitted the fix on 08/30. Also, I did bring this to the >>> notice >>>> of community, you and 1.3 release manager (Roshani) on the RC0 proposal >>>> thread. Also, I would focus on the issue and user impact than who >>>> identified and who is fixing the issue. >>>> >>>> >>>> Based on my discussion with 2 users, I think it is a important feature >>> for >>>> them to see in Apache MXNet v1.3.0. >>>> >>>> >>>> >>>> Best, >>>> >>>> Sandeep >>>> >>>> >>>> [1] Workaround used by the user. >>>> >>>> >>>> net_fp16 = mx.gluon.SymbolBlock.imports('resnet34_fp16-symbol.json', >>>> ['data']) >>>> >>>> params_fp16 = mx.nd.load('resnet34_fp16-0000.params') >>>> >>>> >>>> for k, v in params_fp16.items(): >>>> >>>> new_key = k.split(':')[1] >>>> >>>> net_fp16.collect_params()[new_key].cast(v.dtype) >>>> >>>> >>>> net_fp16.collect_params().load('resnet34_fp16-0000.params', ctx) >>>> >>>> >>>> [2] Amazon Rekognition >>>> >>>> >>>> [3] User story: Train a model -> Cast it to FP16 -> Save the model -> >>> Load >>>> back the model does not work. They have to cast every parameter with a >>>> workaround mentioned above [1]. >>>> >>>> On Tue, Sep 4, 2018 at 4:14 PM Hagay Lupesko <lupe...@gmail.com> >> wrote: >>>> >>>>> Hi Sheng, >>>>> >>>>> Addressing your questions: >>>>> >>>>> - "why this specific bug is more important than all the other known >>> bugs, >>>>> that this becomes a release blocker" >>>>> I do not consider it to be more or less important than other fixes. >> It >>>> can >>>>> be fixed and included in the release alongside the rest of the >> release >>>>> content, right? >>>>> From the description of the issue it seems important since it is >>> blocking >>>>> users from loading models that were previously trained and saved. >> There >>>> is >>>>> nothing stopping the community from including this fix into 1.3.0, >>>>> alongside the rest of the features and fixes. >>>>> >>>>> - "The bug exists since SymbolBlock was introduced a year ago and has >>>>> survived at least three releases, so this is not a regression." >>>>> I do not think I said it is a regression. However, the fact a bug >>> existed >>>>> before, does not mean it is OK to release it rather than fix it. >>>>> >>>>> - "Timeline-wise, this bug was reported on 7/21, but was not reported >>> as >>>>> release-blocker in the release discussion thread until 8/31 [1]. >>> Neither >>>>> its reporting as release-blocker nor its fix made it for the 8/3 code >>>>> freeze." >>>>> You are right, would have been better to have this identified and >> fixed >>>>> earlier and included before code freeze. >>>>> >>>>> - "The PR is still not ready yet as it doesn't have approval." >>>>> I think it is waiting for your review. >>>>> >>>>> - "it would be great if you could provide some additional reasoning >>>> besides >>>>> "X mentions the issue" or "fix was done by X"" >>>>> I have. Repeating what I wrote in my previous email for clarity: >> Basic >>>>> functionality broken: loading a model (albeit one that that was saved >>> as >>>>> non FP32) >>>>> >>>>> So, yes - this issue seems to have been out there for a while, >> somehow >>>> went >>>>> under the radar... but I think the key question is whether this >> blocks >>> a >>>>> basic functionality in MXNet. I believe so, hence my -1 vote. >>>>> >>>>> Hagay >>>>> >>>>>> On Tue, Sep 4, 2018 at 1:19 PM Sheng Zha <szha....@gmail.com> wrote: >>>>>> >>>>>> Hi Hagay and Sandeep, >>>>>> >>>>>> Could you help us understand why this specific bug is more >> important >>>> than >>>>>> all the other known bugs, that this becomes a release blocker? >>>>>> >>>>>> Some facts to consider: >>>>>> - The bug exists since SymbolBlock was introduced a year ago and >> has >>>>>> survived at least three releases, so this is not a regression. >>>>>> - Timeline-wise, this bug was reported on 7/21, but was not >> reported >>> as >>>>>> release-blocker in the release discussion thread until 8/31 [1]. >>>> Neither >>>>>> its reporting as release-blocker nor its fix made it for the 8/3 >> code >>>>>> freeze. >>>>>> - The PR is still not ready yet as it doesn't have approval. >>>>>> >>>>>> Hagay, it would be great if you could provide some additional >>> reasoning >>>>>> besides "X mentions the issue" or "fix was done by X". Thanks. >>>>>> >>>>>> -sz >>>>>> >>>>>> [1] >>>>>> >>>>>> >>>>> >>>> >>> >> https://lists.apache.org/thread.html/d1ed611f98c20d5d85c294b0c07c8bdebca13a209cf66a3872c9123e@%3Cdev.mxnet.apache.org%3E >>>>>> >>>>>> On Tue, Sep 4, 2018 at 12:39 PM Hagay Lupesko <lupe...@gmail.com> >>>> wrote: >>>>>> >>>>>>> Sandeep mentions the issue of an error when user tries to load >>> model >>>>>> params >>>>>>> trained/saved as FP16. >>>>>>> https://github.com/apache/incubator-mxnet/issues/11849 >>>>>>> The fix was done by Sandeep: >>>>>>> https://github.com/apache/incubator-mxnet/pull/12412 and is >> ready >>> to >>>>> be >>>>>>> cherry picked into the release branch. >>>>>>> >>>>>>> This seems like a release blocker to me: >>>>>>> - Basic functionality broken: loading a model (albeit one that >> that >>>> was >>>>>>> saved as non FP32) >>>>>>> - Reported by 3 users (wgchang@, nicklhy@ and ThomasDelteil@) >>>>>>> >>>>>>> -1 (non binding) >>>>>>> >>>>>>> Hagay >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Sep 4, 2018 at 12:01 PM sandeep krishnamurthy < >>>>>>> sandeep.krishn...@gmail.com> wrote: >>>>>>> >>>>>>>> "- 0" >>>>>>>> >>>>>>>> I believe the bug #11849 >>>>>>>> <https://github.com/apache/incubator-mxnet/issues/11849>, >> unable >>>> to >>>>>>> import >>>>>>>> non-fp32 models into Gluon, fixed in this PR #12412 >>>>>>>> <https://github.com/apache/incubator-mxnet/pull/12412> is >>>> important >>>>>> for >>>>>>>> the >>>>>>>> users. I would rather pick this fix in this release than plan a >>>> minor >>>>>>>> release later. >>>>>>>> >>>>>>>> Best, >>>>>>>> Sandeep >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Sep 3, 2018 at 2:34 PM Philip Cho < >>>>> chohy...@cs.washington.edu> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Actually, the command "git clone --recursive >>>>>>>>> https://github.com/apache/incubator-mxnet -b 1.3.0.rc0" >> works >>>> fine >>>>>>> now, >>>>>>>>> never mind. >>>>>>>>> >>>>>>>>> On Mon, Sep 3, 2018 at 1:45 PM Philip Cho < >>>>>> chohy...@cs.washington.edu> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Unfortunately, MXNet was depending on a branch of TVM that >> is >>>> now >>>>>>>>> deleted. >>>>>>>>>> We will have to merge #12448 >>>>>>>>>> <https://github.com/apache/incubator-mxnet/pull/12448> >>> before >>>>> the >>>>>>>>> release. >>>>>>>>>> >>>>>>>>>> Background: See dmlc/tvm#1394 < >>>>>>> https://github.com/dmlc/tvm/issues/1394 >>>>>>>>> . >>>>>>>>>> >>>>>>>>>> Philip. >>>>>>>>>> >>>>>>>>>> On Mon, Sep 3, 2018 at 7:26 AM Carin Meier < >>>> carinme...@gmail.com >>>>>> >>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Checked out the tag, built and tested the Clojure package. >>> +1 >>>>>>>>>>> >>>>>>>>>>> On Fri, Aug 31, 2018 at 10:59 PM Roshani Nagmote < >>>>>>>>>>> roshaninagmo...@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi all, >>>>>>>>>>>> >>>>>>>>>>>> I would like to propose a vote to release Apache MXNet >>>>>>> (incubating) >>>>>>>>>>> version >>>>>>>>>>>> 1.3.0.RC0. Voting will start now (Friday, Aug 31st) and >>> end >>>> at >>>>>>> 7:00 >>>>>>>> PM >>>>>>>>>>>> PDT, Wednesday, Sept 5th. >>>>>>>>>>>> >>>>>>>>>>>> Link to release notes: >>>>>>>>>>>> https://github.com/apache/incubator-mxnet/releases >>>>>>>>>>>> >>>>>>>>>>>> Link to release candidate 1.3.0.rc0: >>>>>>>>>>>> * >>>>>> https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc >>>>>>>>>>>> < >>>>>> https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0 >>>>>>>>> 0* >>>>>>>>>>>> >>>>>>>>>>>> View this page, click on "Build from Source", and use >> the >>>>> source >>>>>>>> code >>>>>>>>>>>> obtained from 1.3.0.rc0 tag: >>>>>>>>>>>> https://mxnet.incubator.apache.org/install/index.html >>>>>>>>>>>> >>>>>>>>>>>> Please remember to TEST first before voting accordingly: >>>>>>>>>>>> >>>>>>>>>>>> +1 = approve >>>>>>>>>>>> +0 = no opinion >>>>>>>>>>>> -1 = disapprove (provide reason) >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Roshani >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Sandeep Krishnamurthy >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Sandeep Krishnamurthy >>>> >>> >>