Hi, I would like to bring a critical performance and stability patch of existing gluon dataloader to 1.4.0: https://github.com/apache/incubator-mxnet/pull/13447 <https://github.com/apache/incubator-mxnet/pull/13447>.
This PR is finished, waiting for CI to pass. Steffen, could you help me add that to the tracked list? Best, Zhi > On Nov 29, 2018, at 4:25 PM, Naveen Swamy <mnnav...@gmail.com> wrote: > > the tests are randomly failing in different stages > http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-13105/ > This PR has failed 8 times so far > > On Thu, Nov 29, 2018 at 3:43 PM Steffen Rochel <steffenroc...@gmail.com> > wrote: > >> Pedro - ok. Please add PR to v1.4.x branch after merge to master and please >> update tracking page >> < >> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack >>> >> . >> Steffen >> >> On Thu, Nov 29, 2018 at 3:00 PM Pedro Larroy <pedro.larroy.li...@gmail.com >>> >> wrote: >> >>> PR is ready from my side and passes the tests, unless somebody raises >>> any concerns it's good to go. >>> On Thu, Nov 29, 2018 at 9:50 PM Steffen Rochel <steffenroc...@gmail.com> >>> wrote: >>>> >>>> Pedro - added to 1.4.0 tracking list >>>> < >>> >> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack >>>> >>>> >>>> Do you have already ETA? >>>> Steffen >>>> >>>> On Thu, Nov 29, 2018 at 6:13 AM Pedro Larroy < >>> pedro.larroy.li...@gmail.com> >>>> wrote: >>>> >>>>> Hi all. >>>>> >>>>> There are two important issues / fixes that should go in the next >>>>> release in my radar: >>>>> >>>>> 1) https://github.com/apache/incubator-mxnet/pull/13409/files >>>>> There is a bug in shape inference on CPU when not using MKL, also we >>>>> are running activation on CPU via MKL when we compile CUDNN+MKLDNN. >>>>> I'm finishing a fix for these issues in the above PR. >>>>> >>>>> 2) https://github.com/apache/incubator-mxnet/issues/13438 >>>>> We are seeing crashes due to unsafe setenv in multithreaded code. >>>>> Setenv / getenv from multiple threads is not safe and is causing >>>>> segfaults. This piece of code (the handlers in pthread_atfork) >> already >>>>> caused a very difficult to diagnose hang in a previous release, where >>>>> a fork inside cudnn would deadlock the engine. >>>>> >>>>> I would remove setenv from 2) as a mitigation, but we would need to >>>>> check for regressions as we could be creating additional threads >>>>> inside the engine. >>>>> >>>>> I would suggest that we address these two major issues before the >> next >>>>> release. >>>>> >>>>> Pedro >>>>> >>>>> >>>>> >>>>> On Sun, Nov 25, 2018 at 11:41 PM Steffen Rochel < >>> steffenroc...@gmail.com> >>>>> wrote: >>>>>> >>>>>> Dear MXNet community, >>>>>> >>>>>> I will be the release manager for the upcoming Apache MXNet 1.4.0 >>>>> release. >>>>>> Sergey Kolychev will be co-managing the release and providing help >>> from >>>>> the >>>>>> committers side. >>>>>> A release candidate will be cut on November 29, 2018 and voting >> will >>>>> start >>>>>> December 7, 2018. Release notes have been drafted here [1]. If you >>> have >>>>> any >>>>>> additional features in progress and would like to include it in >> this >>>>>> release, please assure they have been merged by November 27, 2018. >>>>> Release >>>>>> schedule is available here [2]. >>>>>> >>>>>> Feel free to add any other comments/suggestions. Please help to >>> review >>>>> and >>>>>> merge outstanding PR's and resolve issues impacting the quality of >>> the >>>>>> 1.4.0 release. >>>>>> >>>>>> Regards, >>>>>> >>>>>> Steffen >>>>>> >>>>>> [1] >>>>>> >>>>> >>> >> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes >>>>>> >>>>>> [2] >>>>> >>> >> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland < >>>>>> kellen.sunderl...@gmail.com> wrote: >>>>>> >>>>>>> Spoke too soon[1], looks like others have been adding Turing >>> support as >>>>>>> well (thanks to those helping with this). I believe there's >> still >>> a >>>>> few >>>>>>> changes we'd have to make to claim support though (mshadow CMake >>>>> changes, >>>>>>> PyPi package creation tweaks). >>>>>>> >>>>>>> 1: >>>>>>> >>>>>>> >>>>> >>> >> https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08 >>>>>>> >>>>>>> On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland < >>>>>>> kellen.sunderl...@gmail.com> wrote: >>>>>>> >>>>>>>> Hey Steffen, I'd like to be able to merge this PR for version >>> 1.4: >>>>>>>> https://github.com/apache/incubator-mxnet/pull/13310 . It >> fixes >>> a >>>>>>>> regression in master which causes incorrect feature vectors to >> be >>>>> output >>>>>>>> when using the TensorRT feature. (Thanks to Nathalie for >>> helping me >>>>>>> track >>>>>>>> down the root cause of the issue). I'm currently blocked on a >>> CI >>>>> issue >>>>>>> I >>>>>>>> haven't seen before, but hope to have it resolved by EOW. >>>>>>>> >>>>>>>> One call-out I would make is that we currently don't support >>> Turing >>>>>>>> architecture (sm_75). I've been slowly trying to add support, >>> but I >>>>>>> don't >>>>>>>> think I'd have capacity to do this done by EOW. Does anyone >> feel >>>>>>> strongly >>>>>>>> we need this in the 1.4 release? From my perspective this will >>>>> already >>>>>>> be >>>>>>>> a strong release without it. >>>>>>>> >>>>>>>> On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel < >>>>> steffenroc...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Thanks Patrick, lets target to get the PR's merged this week. >>>>>>>>> >>>>>>>>> Call for contributions from the community: Right now we have >> 10 >>> PR >>>>>>>>> awaiting >>>>>>>>> merge >>>>>>>>> < >>>>>>>>> >>>>>>> >>>>> >>> >> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+ >>>>>>>>>> >>>>>>>>> and >>>>>>>>> we have 61 open PR awaiting review. >>>>>>>>> < >>>>>>>>> >>>>>>> >>>>> >>> >> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review >>>>>>>>>> >>>>>>>>> I would appreciate if you all can help to review the open PR >>> and the >>>>>>>>> committers can drive the merge before code freeze for 1.4.0. >>>>>>>>> >>>>>>>>> The contributors on the Java API are making progress, but not >>> all >>>>>>>>> performance issues are resolved. With some luck it should be >>>>> possible to >>>>>>>>> code freeze towards end of this week. >>>>>>>>> >>>>>>>>> Are there other critical features/bugs/PR you think need to be >>>>> included >>>>>>> in >>>>>>>>> 1.4.0? If so, please communicate as soon as possible. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Steffen >>>>>>>>> >>>>>>>>> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric < >>> patric.z...@intel.com >>>>>> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Thanks, Steffen. I think there is NO open issue to block the >>>>> MKLDNN to >>>>>>>>> GA >>>>>>>>>> now. >>>>>>>>>> >>>>>>>>>> BTW, several quantization related PRs (#13297,#13260) are >>> under >>>>> the >>>>>>>>> review >>>>>>>>>> and I think it can be merged in this week. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> --Patric >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: Steffen Rochel [mailto:steffenroc...@gmail.com] >>>>>>>>>>> Sent: Tuesday, November 20, 2018 2:57 AM >>>>>>>>>>> To: dev@mxnet.incubator.apache.org >>>>>>>>>>> Subject: Re: [Announce] Upcoming Apache MXNet (incubating) >>> 1.4.0 >>>>>>>>> release >>>>>>>>>>> >>>>>>>>>>> On Friday the contributors working on Java API discovered >> a >>>>>>> potential >>>>>>>>>>> performance problem with inference using Java API vs. >>> Python. >>>>>>>>>> Investigation >>>>>>>>>>> is ongoing. >>>>>>>>>>> As the Java API is one of the main features for the >> upcoming >>>>>>> release, >>>>>>>>> I >>>>>>>>>>> suggest to post-pone the code freeze towards end of this >>> week. >>>>>>>>>>> >>>>>>>>>>> Please provide feedback and concern about the change in >>> dates >>>>> for >>>>>>> code >>>>>>>>>>> freeze and 1.4.0 release. I will provide updates on >> progress >>>>>>> resolving >>>>>>>>>> the >>>>>>>>>>> potential performance problem. >>>>>>>>>>> >>>>>>>>>>> Patrick - do you think it is possible to resolve the >>> remaining >>>>>>> issues >>>>>>>>> on >>>>>>>>>> MKL- >>>>>>>>>>> DNN this week, so we can consider GA for MKL-DNN with >> 1.4.0? >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Steffen >>>>>>>>>>> >>>>>>>>>>> On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov < >>>>> mecher...@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> I'd like to remind everyone that 'code freeze' would >> mean >>>>> cutting >>>>>>> a >>>>>>>>>>>> v1.4.x release branch and all following fixes would need >>> to be >>>>>>>>>> backported. >>>>>>>>>>>> Development on master can be continued as usual. >>>>>>>>>>>> >>>>>>>>>>>> Best >>>>>>>>>>>> Anton >>>>>>>>>>>> >>>>>>>>>>>> ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel < >>>>>>>>> steffenroc...@gmail.com>: >>>>>>>>>>>> >>>>>>>>>>>>> Dear MXNet community, >>>>>>>>>>>>> the agreed plan was to establish code freeze for 1.4.0 >>>>> release >>>>>>>>>>>>> today. As the 1.3.1 patch release is still ongoing I >>>>> suggest to >>>>>>>>>>>>> post-pone the code freeze to Friday 16th November >> 2018. >>>>>>>>>>>>> >>>>>>>>>>>>> Sergey Kolychev has agreed to act as co-release >> manager >>> for >>>>> all >>>>>>>>>>>>> tasks >>>>>>>>>>>> which >>>>>>>>>>>>> require committer privileges. If anybody is interested >>> to >>>>>>>>> volunteer >>>>>>>>>>>>> as release manager - now is the time to speak up. >>> Otherwise >>>>> I >>>>>>> will >>>>>>>>>>>>> manage >>>>>>>>>>>> the >>>>>>>>>>>>> release. >>>>>>>>>>>>> >>>>>>>>>>>>> Regards, >>>>>>>>>>>>> Steffen >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>> >>> >>