I like that proposal Bhavin. I'm also interested to see what the other community members think.
On Sat, Jan 6, 2018 at 6:27 PM, Bhavin Thaker <[email protected]> wrote: > Hi Kellen, > > Here is my opinion and stand on this: > > I see no need to test on CUDA8 in Apache MXNet CI, especially when CUDA9 is > backward compatible with earlier Nvidia hardware generations. There is time > and resources cost to maintaining the various combinations in the CI and so > I am NOT in favor of running CUDA8 in CI unless there is a technical > reason/requirement for it. This approach helps to encourage users to move > to the latest CUDA version and thus keep the open-source community’s > maintenance cost low for the generic option of CUDA9. > > For example: If a user opens a github issue/problem with Apache MXNet and > CUDA8, I would ask the user to test it with CUDA9. If the problem happens > only on CUDA8, then a volunteer in the community may work on it. If the > problem happens on CUDA9 as well, then, in my humble opinion, and this > problem must be fixed by the community. In short, I propose that the MXNet > CI run tests only with latest CUDA9 version and NOT CUDA8. > > I am eager to hear alternate viewpoints/corrections from folks other than > Kellen and me. > > Bhavin Thaker. > > On Sat, Jan 6, 2018 at 8:24 AM kellen sunderland < > [email protected]> wrote: > > > Thanks for the thoughts Bhavin, supporting the latest release would also > be > > an option, and it would be easier from a support point of view. > > > > "2) I think your question probably is what should be tested by the Apache > > MXNet CI and NOT what is supported by Apache MXNet, correct?" > > > > I view these two things as being closely related, if not equivalent. If > we > > don't run at least basic tests of old versions of CUDA I think there will > > be issues that slip through. That being said we can rely on users to > > report these issues, and chances are we'll be able to provide backwards > > compatible patches. At a minimum I'd recommend we should run tests on > all > > supported CUDA versions before a release. > > > > -Kellen > > > > > > On Sat, Jan 6, 2018 at 5:05 PM, Bhavin Thaker <[email protected]> > > wrote: > > > > > Hi Kellen, > > > > > > 1) Does Apache MXNet (Incubating) have a support matrix? I think the > > answer > > > is no, because I don’t know of where it is documented. One of the > mentors > > > told me earlier that the community uses and modifies the open-source > > > project as per their individual requirements or those of the > community. > > As > > > far as I know, there is no single entity that is responsible for > > supporting > > > something in MXNet — corrections to my understanding are welcome. > > > > > > 2) I think your question probably is what should be tested by the > Apache > > > MXNet CI and NOT what is supported by Apache MXNet, correct? > > > > > > If yes, I propose testing only the latest CUDA9 and the respective > latest > > > cuDNN version in the MXNet CI since CUDA9 is backward compatible with > > > earlier Nvidia hardware generations. > > > > > > I would like to hear reasons why this would not work. > > > > > > I have commented on the github issue as well: > > > https://github.com/apache/incubator-mxnet/issues/8805 > > > > > > Bhavin Thaker. > > > > > > On Sat, Jan 6, 2018 at 3:30 AM kellen sunderland < > > > [email protected]> wrote: > > > > > > > Hello all, I'd like to propose that we nail down exactly which > versions > > > of > > > > CUDA we're supporting. We can then ensure that we've got good test > > > > coverage for those specific versions in CI. At the moment it's > > ambiguous > > > > what our current policy is. I.e. when do we drop support for old > > > > versions? As a result we potentially cut a release promising to > > support > > > a > > > > certain version of CUDA, then retroactively drop support after we > find > > an > > > > issue. > > > > > > > > I'd like to propose that we officially support N, and N-1 versions of > > > CUDA, > > > > where N is the most recent major version release. In addition we can > > do > > > > our best to support libraries that are available for download for > those > > > > versions. Supporting these CUDA versions would also dictate which > > > hardware > > > > we support in terms of compute capability (of course resource > > constraints > > > > would also play some role in our ability to support some hardware). > > > > > > > > As an example this would mean that currently we'd officially support > > CUDA > > > > 9.* and 8. This would imply we support CUDNN 5.1 through 7, as those > > > > libraries are available for CUDA 8, and 9. It would also mean we > > support > > > > 3.0-7.x (Kepler, Maxwell, Pascal, Volta) taking the more restrictive > > > > hardware requirements of CUDA 9 into account. > > > > > > > > What do you all think? Would this be a reasonable support strategy? > > Are > > > > these the versions you'd like to see covered in CI? > > > > > > > > -Kellen > > > > > > > > A relevant issue: > > https://github.com/apache/incubator-mxnet/issues/8805 > > > > > > > > > >
