I see. Yeah probably those can be removed. I haven’t checked the source, but I would be surprised if omp even looked at the environment variable after initial startup since looking up environment variables is a slow linear search each time.
On Thu, Nov 29, 2018 at 8:09 AM Pedro Larroy <pedro.larroy.li...@gmail.com> wrote: > Chris. The problem is with setenv, not with getenv. We don't want to > remove any getenv call, just these misplaced setenvs: > > > https://github.com/apache/incubator-mxnet/blob/master/src/initialize.cc#L61 > > Please check the code above carefully and give us your feedback. Based > on your email I think we don't yet have a common understanding of the > root cause of this issue. > > Pedro. > On Thu, Nov 29, 2018 at 4:02 PM Chris Olivier <cjolivie...@gmail.com> > wrote: > > > > - getenv should be thread safe as long as nothing is calling > putenv/setenv > > in another thread (the environment doesn’t change) as stated here: > > > > http://www.cplusplus.com/reference/cstdlib/getenv/ > > > > it’s a simple library call, so to be sure either way, one can check the > > actual source and see (in case some particular implementation is acting > in > > a particularly thread-unsafe manner). This should be vetted before making > > any high-impact decisions such as trying to go remove every getenv call > in > > the whole system. > > > > - locking after fork is possibly due to libgomp not supporting forking > such > > that after a fork, a call is made to release the blocked omp threads and > > the main thread waits for the omp threads to finish, but the omp threads > > belong to the pre-forked process and thus never execute, causing that > > forked process to freeze. This behavior has been witnessed before. > > > > > > > > > > On Thu, Nov 29, 2018 at 6:13 AM Pedro Larroy < > pedro.larroy.li...@gmail.com> > > wrote: > > > > > Hi all. > > > > > > There are two important issues / fixes that should go in the next > > > release in my radar: > > > > > > 1) https://github.com/apache/incubator-mxnet/pull/13409/files > > > There is a bug in shape inference on CPU when not using MKL, also we > > > are running activation on CPU via MKL when we compile CUDNN+MKLDNN. > > > I'm finishing a fix for these issues in the above PR. > > > > > > 2) https://github.com/apache/incubator-mxnet/issues/13438 > > > We are seeing crashes due to unsafe setenv in multithreaded code. > > > Setenv / getenv from multiple threads is not safe and is causing > > > segfaults. This piece of code (the handlers in pthread_atfork) already > > > caused a very difficult to diagnose hang in a previous release, where > > > a fork inside cudnn would deadlock the engine. > > > > > > I would remove setenv from 2) as a mitigation, but we would need to > > > check for regressions as we could be creating additional threads > > > inside the engine. > > > > > > I would suggest that we address these two major issues before the next > > > release. > > > > > > Pedro > > > > > > > > > > > > On Sun, Nov 25, 2018 at 11:41 PM Steffen Rochel < > steffenroc...@gmail.com> > > > wrote: > > > > > > > > Dear MXNet community, > > > > > > > > I will be the release manager for the upcoming Apache MXNet 1.4.0 > > > release. > > > > Sergey Kolychev will be co-managing the release and providing help > from > > > the > > > > committers side. > > > > A release candidate will be cut on November 29, 2018 and voting will > > > start > > > > December 7, 2018. Release notes have been drafted here [1]. If you > have > > > any > > > > additional features in progress and would like to include it in this > > > > release, please assure they have been merged by November 27, 2018. > > > Release > > > > schedule is available here [2]. > > > > > > > > Feel free to add any other comments/suggestions. Please help to > review > > > and > > > > merge outstanding PR's and resolve issues impacting the quality of > the > > > > 1.4.0 release. > > > > > > > > Regards, > > > > > > > > Steffen > > > > > > > > [1] > > > > > > > > https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes > > > > > > > > [2] > > > > https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status > > > > > > > > > > > > > > > > > > > > On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland < > > > > kellen.sunderl...@gmail.com> wrote: > > > > > > > > > Spoke too soon[1], looks like others have been adding Turing > support as > > > > > well (thanks to those helping with this). I believe there's still > a > > > few > > > > > changes we'd have to make to claim support though (mshadow CMake > > > changes, > > > > > PyPi package creation tweaks). > > > > > > > > > > 1: > > > > > > > > > > > > > > https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08 > > > > > > > > > > On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland < > > > > > kellen.sunderl...@gmail.com> wrote: > > > > > > > > > > > Hey Steffen, I'd like to be able to merge this PR for version > 1.4: > > > > > > https://github.com/apache/incubator-mxnet/pull/13310 . It fixes > a > > > > > > regression in master which causes incorrect feature vectors to be > > > output > > > > > > when using the TensorRT feature. (Thanks to Nathalie for > helping me > > > > > track > > > > > > down the root cause of the issue). I'm currently blocked on a > CI > > > issue > > > > > I > > > > > > haven't seen before, but hope to have it resolved by EOW. > > > > > > > > > > > > One call-out I would make is that we currently don't support > Turing > > > > > > architecture (sm_75). I've been slowly trying to add support, > but I > > > > > don't > > > > > > think I'd have capacity to do this done by EOW. Does anyone feel > > > > > strongly > > > > > > we need this in the 1.4 release? From my perspective this will > > > already > > > > > be > > > > > > a strong release without it. > > > > > > > > > > > > On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel < > > > steffenroc...@gmail.com> > > > > > > wrote: > > > > > > > > > > > >> Thanks Patrick, lets target to get the PR's merged this week. > > > > > >> > > > > > >> Call for contributions from the community: Right now we have 10 > PR > > > > > >> awaiting > > > > > >> merge > > > > > >> < > > > > > >> > > > > > > > > > https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+ > > > > > >> > > > > > > >> and > > > > > >> we have 61 open PR awaiting review. > > > > > >> < > > > > > >> > > > > > > > > > https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review > > > > > >> > > > > > > >> I would appreciate if you all can help to review the open PR > and the > > > > > >> committers can drive the merge before code freeze for 1.4.0. > > > > > >> > > > > > >> The contributors on the Java API are making progress, but not > all > > > > > >> performance issues are resolved. With some luck it should be > > > possible to > > > > > >> code freeze towards end of this week. > > > > > >> > > > > > >> Are there other critical features/bugs/PR you think need to be > > > included > > > > > in > > > > > >> 1.4.0? If so, please communicate as soon as possible. > > > > > >> > > > > > >> Regards, > > > > > >> Steffen > > > > > >> > > > > > >> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric < > patric.z...@intel.com > > > > > > > > > >> wrote: > > > > > >> > > > > > >> > Thanks, Steffen. I think there is NO open issue to block the > > > MKLDNN to > > > > > >> GA > > > > > >> > now. > > > > > >> > > > > > > >> > BTW, several quantization related PRs (#13297,#13260) are > under > > > the > > > > > >> review > > > > > >> > and I think it can be merged in this week. > > > > > >> > > > > > > >> > Thanks, > > > > > >> > > > > > > >> > --Patric > > > > > >> > > > > > > >> > > > > > > >> > > -----Original Message----- > > > > > >> > > From: Steffen Rochel [mailto:steffenroc...@gmail.com] > > > > > >> > > Sent: Tuesday, November 20, 2018 2:57 AM > > > > > >> > > To: dev@mxnet.incubator.apache.org > > > > > >> > > Subject: Re: [Announce] Upcoming Apache MXNet (incubating) > 1.4.0 > > > > > >> release > > > > > >> > > > > > > > >> > > On Friday the contributors working on Java API discovered a > > > > > potential > > > > > >> > > performance problem with inference using Java API vs. > Python. > > > > > >> > Investigation > > > > > >> > > is ongoing. > > > > > >> > > As the Java API is one of the main features for the upcoming > > > > > release, > > > > > >> I > > > > > >> > > suggest to post-pone the code freeze towards end of this > week. > > > > > >> > > > > > > > >> > > Please provide feedback and concern about the change in > dates > > > for > > > > > code > > > > > >> > > freeze and 1.4.0 release. I will provide updates on progress > > > > > resolving > > > > > >> > the > > > > > >> > > potential performance problem. > > > > > >> > > > > > > > >> > > Patrick - do you think it is possible to resolve the > remaining > > > > > issues > > > > > >> on > > > > > >> > MKL- > > > > > >> > > DNN this week, so we can consider GA for MKL-DNN with 1.4.0? > > > > > >> > > > > > > > >> > > Regards, > > > > > >> > > Steffen > > > > > >> > > > > > > > >> > > On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov < > > > mecher...@gmail.com> > > > > > >> > > wrote: > > > > > >> > > > > > > > >> > > > I'd like to remind everyone that 'code freeze' would mean > > > cutting > > > > > a > > > > > >> > > > v1.4.x release branch and all following fixes would need > to be > > > > > >> > backported. > > > > > >> > > > Development on master can be continued as usual. > > > > > >> > > > > > > > > >> > > > Best > > > > > >> > > > Anton > > > > > >> > > > > > > > > >> > > > ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel < > > > > > >> steffenroc...@gmail.com>: > > > > > >> > > > > > > > > >> > > > > Dear MXNet community, > > > > > >> > > > > the agreed plan was to establish code freeze for 1.4.0 > > > release > > > > > >> > > > > today. As the 1.3.1 patch release is still ongoing I > > > suggest to > > > > > >> > > > > post-pone the code freeze to Friday 16th November 2018. > > > > > >> > > > > > > > > > >> > > > > Sergey Kolychev has agreed to act as co-release manager > for > > > all > > > > > >> > > > > tasks > > > > > >> > > > which > > > > > >> > > > > require committer privileges. If anybody is interested > to > > > > > >> volunteer > > > > > >> > > > > as release manager - now is the time to speak up. > Otherwise > > > I > > > > > will > > > > > >> > > > > manage > > > > > >> > > > the > > > > > >> > > > > release. > > > > > >> > > > > > > > > > >> > > > > Regards, > > > > > >> > > > > Steffen > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > >