It doesn't look like flakiness to me at first sight. I think it might be related to resource usage / allocation / leak in the worst case.
Could be that there was not enough memory GPU memory at the time of test execution. But I'm just speculating, hence my original question. Pedro. On Mon, Oct 1, 2018 at 8:16 PM Lin Yuan <[email protected]> wrote: > Hi Pedro, > > I also got this failure in my PR > > http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-11742/27/pipeline > > I was not able to identify the root cause of it from changelist. Are you > suggesting there is some flakiness in the master branch too? > > Thanks, > > Lin > > On Mon, Oct 1, 2018 at 4:55 PM Pedro Larroy <[email protected]> > wrote: > > > Hi > > > > I saw this failure on CI: > > > > > http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/master/1697/pipeline > > > > Have you seen other cases where we fail to select the best CUDNN > algorithm? > > In which circumstances this could happen, and do you think is a good idea > > to have one selected by default as a last resort? > > > > > > Pedro. > > >
