Re: CUDA Support [DISCUSS]

kellen sunderland Sat, 06 Jan 2018 09:32:15 -0800

I like that proposal Bhavin.  I'm also interested to see what the other
community members think.


On Sat, Jan 6, 2018 at 6:27 PM, Bhavin Thaker <[email protected]>
wrote:

> Hi Kellen,
>
> Here is my opinion and stand on this:
>
> I see no need to test on CUDA8 in Apache MXNet CI, especially when CUDA9 is
> backward compatible with earlier Nvidia hardware generations. There is time
> and resources cost to maintaining the various combinations in the CI and so
> I am NOT in favor of running CUDA8 in CI unless there is a technical
> reason/requirement for it. This approach helps to encourage users to move
> to the latest CUDA version and thus keep the open-source community’s
> maintenance cost low for the generic option of CUDA9.
>
> For example: If a user opens a github issue/problem with Apache MXNet and
> CUDA8, I would ask the user to test it with CUDA9. If the problem happens
> only on CUDA8, then a volunteer in the community may work on it. If the
> problem happens on CUDA9 as well, then, in my humble opinion, and this
> problem must be fixed by the community. In short, I propose that the MXNet
> CI run tests only with latest CUDA9 version and NOT CUDA8.
>
> I am eager to hear alternate viewpoints/corrections from folks other than
> Kellen and me.
>
> Bhavin Thaker.
>
> On Sat, Jan 6, 2018 at 8:24 AM kellen sunderland <
> [email protected]> wrote:
>
> > Thanks for the thoughts Bhavin, supporting the latest release would also
> be
> > an option, and it would be easier from a support point of view.
> >
> > "2) I think your question probably is what should be tested by the Apache
> > MXNet CI and NOT what is supported by Apache MXNet, correct?"
> >
> > I view these two things as being closely related, if not equivalent.  If
> we
> > don't run at least basic tests of old versions of CUDA I think there will
> > be issues that slip through.  That being said we can rely on users to
> > report these issues, and chances are we'll be able to provide backwards
> > compatible patches.  At a minimum I'd recommend we should run tests on
> all
> > supported CUDA versions before a release.
> >
> > -Kellen
> >
> >
> > On Sat, Jan 6, 2018 at 5:05 PM, Bhavin Thaker <[email protected]>
> > wrote:
> >
> > > Hi Kellen,
> > >
> > > 1) Does Apache MXNet (Incubating) have a support matrix? I think the
> > answer
> > > is no, because I don’t know of where it is documented. One of the
> mentors
> > > told me earlier that the community uses and modifies the open-source
> > > project as per their individual  requirements or those of the
> community.
> > As
> > > far as I know, there is no single entity that is responsible for
> > supporting
> > > something in MXNet — corrections to my understanding are welcome.
> > >
> > > 2) I think your question probably is what should be tested by the
> Apache
> > > MXNet CI and NOT what is supported by Apache MXNet, correct?
> > >
> > > If yes, I propose testing only the latest CUDA9 and the respective
> latest
> > > cuDNN version in the MXNet CI since CUDA9 is backward compatible with
> > > earlier Nvidia hardware generations.
> > >
> > > I would like to hear reasons why this would not work.
> > >
> > > I have commented on the github issue as well:
> > > https://github.com/apache/incubator-mxnet/issues/8805
> > >
> > > Bhavin Thaker.
> > >
> > > On Sat, Jan 6, 2018 at 3:30 AM kellen sunderland <
> > > [email protected]> wrote:
> > >
> > > > Hello all, I'd like to propose that we nail down exactly which
> versions
> > > of
> > > > CUDA we're supporting.  We can then ensure that we've got good test
> > > > coverage for those specific versions in CI.  At the moment it's
> > ambiguous
> > > > what our current policy is.  I.e. when do we drop support for old
> > > > versions?  As a result we potentially cut a release promising to
> > support
> > > a
> > > > certain version of CUDA, then retroactively drop support after we
> find
> > an
> > > > issue.
> > > >
> > > > I'd like to propose that we officially support N, and N-1 versions of
> > > CUDA,
> > > > where N is the most recent major version release.  In addition we can
> > do
> > > > our best to support libraries that are available for download for
> those
> > > > versions.  Supporting these CUDA versions would also dictate which
> > > hardware
> > > > we support in terms of compute capability (of course resource
> > constraints
> > > > would also play some role in our ability to support some hardware).
> > > >
> > > > As an example this would mean that currently we'd officially support
> > CUDA
> > > > 9.* and 8.  This would imply we support CUDNN 5.1 through 7, as those
> > > > libraries are available for CUDA 8, and 9.  It would also mean we
> > support
> > > > 3.0-7.x (Kepler, Maxwell, Pascal, Volta) taking the more restrictive
> > > > hardware requirements of CUDA 9 into account.
> > > >
> > > > What do you all think?  Would this be a reasonable support strategy?
> > Are
> > > > these the versions you'd like to see covered in CI?
> > > >
> > > > -Kellen
> > > >
> > > > A relevant issue:
> > https://github.com/apache/incubator-mxnet/issues/8805
> > > >
> > >
> >
>

Re: CUDA Support [DISCUSS]

Reply via email to