Is there a plan for adding those CUDA 8 tests back to CI? What about CUDA 7?
There were a few build problems in the past few weeks due to lack of CI coverage: - https://github.com/apache/incubator-mxnet/pull/10710 were found during 1.2 rc voting - https://github.com/apache/incubator-mxnet/issues/10981 were reported by an user with CUDA 7 Having these covered in CI will help catch the issues early. I don't recall if we decided to drop CUDA 7 support for MXNet. Best, Haibin On Wed, Mar 21, 2018 at 6:32 AM, Marco de Abreu < [email protected]> wrote: > Hello, > > the migration has just been completed and we're now running our UNIX based > slaves on CUDA 9.1 with CuDNN 7. The commit is available at > https://github.com/apache/incubator-mxnet/commit/ > b0a6760efa141aeca87b03ecf34dae924bd1af46 > . > > No jobs have been interrupted by this migration. If you encounter any > errors, please reach back to me. > > Best regards, > Marco > > On Tue, Mar 20, 2018 at 11:20 PM, Marco de Abreu < > [email protected]> wrote: > > > Hello, > > > > the results of this vote are as follows: > > > > +1: > > Jun > > Anirudh > > Hao > > Marco > > > > 0: > > Chris > > > > -1: > > Naveen (veto recalled as of https://lists.apache.org/thread.html/ > > 242db72a0c96349ef6e0ff1d3b1fe0dc7f7a9082532724c3293666c5@% > > 3Cdev.mxnet.apache.org%3E) > > > > Under the constraint that we will use CUDA 8 on Windows and CUDA 9.1 on > > UNIX slaves and work on integration tests for CUDA 8 in the long term, > this > > vote counts as PASSED. > > > > The PR for this change is available at https://github.com/apache/ > > incubator-mxnet/pull/10108. I have developed and tested the new slaves in > > our test environment and everything looks promising so far. The plan is > as > > follows: > > > > 1. Get https://github.com/apache/incubator-mxnet/pull/10108 approved > > to allow self-merge – CI can’t pass until slaves have been upgraded. > > 2. Replace all existing slaves with new upgraded slaves. > > 3. Retrigger https://github.com/apache/incubator-mxnet/pull/10108 to > > merge necessary changes into master. > > > > IMPORTANT: The migration will happen tomorrow, so please expect some > delay > > in job execution - the CI website will be unaffected. Ideally, no jobs > > should fail - in case they do, please feel free to retrigger them by > using > > an empty commit. In case of any errors appearing after the upgrade, don't > > hesitate to contact me! > > > > Best regards, > > Marco > > > > > > On Tue, Mar 20, 2018 at 1:39 AM, Naveen Swamy <[email protected]> > wrote: > > > >> Yes, for short-term. > >> > >> On Monday, March 19, 2018, Chris Olivier <[email protected]> > wrote: > >> > >> > In the short ter, Naveen, are you ok with Linux running CUDA 9 and > >> Windows > >> > CUDA 8 in order to get CUDA version coverage? > >> > > >> > On 2018/03/16 21:09:09, Marco de Abreu <[email protected]> > >> > wrote: > >> > > Thanks for your input. How would you propose to proceed in terms of > a > >> > > timeline in case this vote succeedes? I don't really have time to > work > >> > on a > >> > > nightly setup right now. Would anybody in the community be able to > >> help > >> > me > >> > > out here or shall we wait with the migration until a nightly setup > for > >> > CUDA > >> > > 8 is up? > >> > > > >> > > -Marco > >> > > > >> > > On Fri, Mar 16, 2018 at 9:55 PM, Bhavin Thaker < > >> [email protected]> > >> > > wrote: > >> > > > >> > > > +1 to the suggestion of testing CUDA8 in few nightly instances and > >> > using > >> > > > CUDA9 for most instances in CI. > >> > > > > >> > > > Bhavin Thaker. > >> > > > > >> > > > On Fri, Mar 16, 2018 at 12:37 PM Naveen Swamy <[email protected] > > > >> > wrote: > >> > > > > >> > > > > I think its best to add support for CUDA 9.0 while retaining > >> existing > >> > > > > support for CUDA 8, code might regress when you remove and > create > >> > more > >> > > > work > >> > > > > to add CUDA 8 support back. > >> > > > > > >> > > > > On Fri, Mar 16, 2018 at 9:29 AM, Marco de Abreu < > >> > > > > [email protected]> wrote: > >> > > > > > >> > > > > > Yeah, sorry Chris, mixed up the names. > >> > > > > > > >> > > > > > @Naveen: Would you be fine with doing the switch now and > adding > >> > > > > integration > >> > > > > > tests later or is this a hard constraint for you? > >> > > > > > > >> > > > > > On Wed, Mar 14, 2018 at 6:39 PM, Chris Olivier < > >> > [email protected]> > >> > > > > > wrote: > >> > > > > > > >> > > > > > > Isn't the TItan V the Volta and not the Tesla? > >> > > > > > > > >> > > > > > > On Wed, Mar 14, 2018 at 10:36 AM, Naveen Swamy < > >> > [email protected]> > >> > > > > > wrote: > >> > > > > > > > >> > > > > > > > Marco, > >> > > > > > > > My -1 vote is for dropping support to CUDA 8 and not for > >> adding > >> > > > CUDA > >> > > > > 9. > >> > > > > > > > CUDA 9.0 support for MXNet was added Oct'30-2017, I think > >> that > >> > all > >> > > > > > users > >> > > > > > > > might not have switched to CUDA 9.0 > >> > > > > > > > > >> > > > > > > > Look at the earlier discussion on the same topic > >> > > > > > > > > >> > > > > > > > https://lists.apache.org/thread.html/ > >> > > > 27b84e4fc0e0728f2e4ad8b6827d7f > >> > > > > > > > 996635021a5a4d47b5d3f4dbfb@%3Cdev.mxnet.apache.org%3E > >> > > > > > > > > >> > > > > > > > On Wed, Mar 14, 2018 at 10:14 AM, Marco de Abreu < > >> > > > > > > > [email protected]> wrote: > >> > > > > > > > > >> > > > > > > > > Right, the code changes would not be validated against > >> CUDA > >> > 8.0 > >> > > > as > >> > > > > > part > >> > > > > > > > of > >> > > > > > > > > the PR process. > >> > > > > > > > > > >> > > > > > > > > I don't have any numbers, but it's pretty unlikely that > >> > anybody > >> > > > is > >> > > > > > > still > >> > > > > > > > > using CUDA 8.0. According to > >> > > > > > > > > https://en.wikipedia.org/wiki/CUDA#GPUs_supported, the > >> > devices > >> > > > > which > >> > > > > > > are > >> > > > > > > > > not being supported by CUDA 9 are under the Fermi > >> > architecture > >> > > > > which > >> > > > > > > has > >> > > > > > > > > been released in April 2010. These GPUs are way too old, > >> so I > >> > > > think > >> > > > > > > we're > >> > > > > > > > > safe with not covering them specifically - this does not > >> mean > >> > > > we're > >> > > > > > > > > entirely deprecating them. > >> > > > > > > > > > >> > > > > > > > > One thing to note here is that we're not testing CUDA 9 > >> as of > >> > > > now. > >> > > > > > > > > Considering that the Telsa architecture (Titan V, V100) > >> > requires > >> > > > at > >> > > > > > > least > >> > > > > > > > > CUDA 9 and those are probably the most widely used GPUs > >> for > >> > Deep > >> > > > > > > > Learning, > >> > > > > > > > > we'd probably be covering a wider user-base in > comparison > >> to > >> > > > CUDA 8 > >> > > > > > if > >> > > > > > > we > >> > > > > > > > > make that switch. > >> > > > > > > > > > >> > > > > > > > > -Marco > >> > > > > > > > > > >> > > > > > > > > On Wed, Mar 14, 2018 at 5:59 PM, Naveen Swamy < > >> > > > [email protected]> > >> > > > > > > > wrote: > >> > > > > > > > > > >> > > > > > > > > > Does this mean that MXNet Users who use CUDA 8.0 will > >> not > >> > be > >> > > > > > > > > > supported(since you are stopping to test CUDA 8.0) ? I > >> > suggest > >> > > > we > >> > > > > > at > >> > > > > > > > > least > >> > > > > > > > > > have nightly tests for CUDA 8.0. > >> > > > > > > > > > > >> > > > > > > > > > Do you have a sense of how many users are using CUDA > >> > 8.0/9.0 ? > >> > > > > > > > > > > >> > > > > > > > > > -1 > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > On Wed, Mar 14, 2018 at 9:50 AM, Chris Olivier < > >> > > > > > > [email protected]> > >> > > > > > > > > > wrote: > >> > > > > > > > > > > >> > > > > > > > > > > +0 > >> > > > > > > > > > > > >> > > > > > > > > > > On Wed, Mar 14, 2018 at 9:45 AM, Jin, Hao < > >> > [email protected]> > >> > > > > > wrote: > >> > > > > > > > > > > > >> > > > > > > > > > > > +1 > >> > > > > > > > > > > > > >> > > > > > > > > > > > On 3/14/18, 9:04 AM, "Anirudh" < > >> [email protected] > >> > > > >> > > > > wrote: > >> > > > > > > > > > > > > >> > > > > > > > > > > > +1 > >> > > > > > > > > > > > > >> > > > > > > > > > > > On Mar 14, 2018 8:56 AM, "Wu, Jun" < > >> > [email protected]> > >> > > > > > wrote: > >> > > > > > > > > > > > > >> > > > > > > > > > > > > +1 > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > On 3/14/18, 8:52 AM, "Marco de Abreu" < > >> > > > > > > > > > > [email protected]> > >> > > > > > > > > > > > > wrote: > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > Hello, > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > this is a vote to upgrade our CI > >> environment > >> > from > >> > > > > the > >> > > > > > > > > current > >> > > > > > > > > > > > CUDA 8.0 > >> > > > > > > > > > > > > with > >> > > > > > > > > > > > > CuDNN 5.0 to CUDA 9.1 with CuDNN 7.0. > >> Reason > >> > > > being > >> > > > > > that > >> > > > > > > > > NVCC > >> > > > > > > > > > > > under > >> > > > > > > > > > > > > CUDA 8 > >> > > > > > > > > > > > > does not support the Volta GPUs used in > >> AWS > >> > P3 > >> > > > > > > instances > >> > > > > > > > > and > >> > > > > > > > > > > thus > >> > > > > > > > > > > > > limiting > >> > > > > > > > > > > > > our test capabilities. More details are > >> > available > >> > > > > at > >> > > > > > > [1]. > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > In order to introduce support for > >> > Quantization > >> > > > [1], > >> > > > > > I'd > >> > > > > > > > > like > >> > > > > > > > > > to > >> > > > > > > > > > > > > perform a > >> > > > > > > > > > > > > system-wide upgrade. This should have no > >> > negative > >> > > > > > > impact > >> > > > > > > > in > >> > > > > > > > > > our > >> > > > > > > > > > > > users > >> > > > > > > > > > > > > but > >> > > > > > > > > > > > > rather makes sure that we're actually > >> testing > >> > > > with > >> > > > > > the > >> > > > > > > > > latest > >> > > > > > > > > > > > > versions. The > >> > > > > > > > > > > > > PR is available at [3]. > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > This means that we would stop verifying > >> CUDA > >> > 8 > >> > > > and > >> > > > > > > CuDNN > >> > > > > > > > > 5.0 > >> > > > > > > > > > as > >> > > > > > > > > > > > part > >> > > > > > > > > > > > > of our > >> > > > > > > > > > > > > PR process. At a later point in time, > this > >> > could > >> > > > be > >> > > > > > > > picked > >> > > > > > > > > up > >> > > > > > > > > > > as > >> > > > > > > > > > > > a > >> > > > > > > > > > > > > candidate for an integration test as > part > >> of > >> > the > >> > > > > > > nightly > >> > > > > > > > > > suite. > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > This is a lazy vote, ending on 17th of > >> March, > >> > > > 2018 > >> > > > > at > >> > > > > > > > 17:00 > >> > > > > > > > > > > (UTC > >> > > > > > > > > > > > +1). > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > Best regards, > >> > > > > > > > > > > > > Marco > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > [1]: > >> > > > > https://issues.apache.org/jira/browse/MXNET-99 > >> > > > > > > > > > > > > [2]: https://github.com/apache/ > >> > > > > > > incubator-mxnet/pull/9552 > >> > > > > > > > > > > > > [3]: https://github.com/apache/ > >> > > > > > > > incubator-mxnet/pull/10108 > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > > > >
