I could not reproduce the error on an EC2 g3x8 instance making it hard to debug. I also suspect it was due to resource usage limit on ci Instance.
On Mon, Oct 1, 2018 at 10:40 PM Pedro Larroy <pedro.larroy.li...@gmail.com> wrote: > It doesn't look like flakiness to me at first sight. I think it might be > related to resource usage / allocation / leak in the worst case. > > Could be that there was not enough memory GPU memory at the time of test > execution. But I'm just speculating, hence my original question. > > Pedro. > > On Mon, Oct 1, 2018 at 8:16 PM Lin Yuan <apefor...@gmail.com> wrote: > > > Hi Pedro, > > > > I also got this failure in my PR > > > > > http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-11742/27/pipeline > > > > I was not able to identify the root cause of it from changelist. Are you > > suggesting there is some flakiness in the master branch too? > > > > Thanks, > > > > Lin > > > > On Mon, Oct 1, 2018 at 4:55 PM Pedro Larroy < > pedro.larroy.li...@gmail.com> > > wrote: > > > > > Hi > > > > > > I saw this failure on CI: > > > > > > > > > http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/master/1697/pipeline > > > > > > Have you seen other cases where we fail to select the best CUDNN > > algorithm? > > > In which circumstances this could happen, and do you think is a good > idea > > > to have one selected by default as a last resort? > > > > > > > > > Pedro. > > > > > >