Re: CUDNN algorithm selection failure

Lin Yuan Mon, 01 Oct 2018 22:58:57 -0700

I could not reproduce the error on an EC2 g3x8 instance making it hard to
debug. I also suspect it was due to resource usage limit on ci   Instance.


On Mon, Oct 1, 2018 at 10:40 PM Pedro Larroy <[email protected]>
wrote:

> It doesn't look like flakiness to me at first sight. I think it might be
> related to resource usage / allocation / leak in the worst case.
>
> Could be that there was not enough memory GPU memory at the time of test
> execution. But I'm just speculating, hence my original question.
>
> Pedro.
>
> On Mon, Oct 1, 2018 at 8:16 PM Lin Yuan <[email protected]> wrote:
>
> > Hi Pedro,
> >
> > I also got this failure in my PR
> >
> >
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-11742/27/pipeline
> >
> > I was not able to identify the root cause of it from changelist. Are you
> > suggesting there is some flakiness in the master branch too?
> >
> > Thanks,
> >
> > Lin
> >
> > On Mon, Oct 1, 2018 at 4:55 PM Pedro Larroy <
> [email protected]>
> > wrote:
> >
> > > Hi
> > >
> > > I saw this failure on CI:
> > >
> > >
> >
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/master/1697/pipeline
> > >
> > > Have you seen other cases where we fail to select the best CUDNN
> > algorithm?
> > > In which circumstances this could happen, and do you think is a good
> idea
> > > to have one selected by default as a last resort?
> > >
> > >
> > > Pedro.
> > >
> >
>

Re: CUDNN algorithm selection failure

Reply via email to