[LAZY VOTE][RESULT] Upgrade CI to CUDA 9.1 with CuDNN 7.0

Marco de Abreu Tue, 20 Mar 2018 15:21:20 -0700

Hello,

the results of this vote are as follows:


+1:
Jun
Anirudh
Hao
Marco

0:
Chris

-1:
Naveen (veto recalled as of
https://lists.apache.org/thread.html/242db72a0c96349ef6e0ff1d3b1fe0dc7f7a9082532724c3293666c5@%3Cdev.mxnet.apache.org%3E
)

Under the constraint that we will use CUDA 8 on Windows and CUDA 9.1 on
UNIX slaves and work on integration tests for CUDA 8 in the long term, this
vote counts as PASSED.

The PR for this change is available at
https://github.com/apache/incubator-mxnet/pull/10108. I have developed and
tested the new slaves in our test environment and everything looks
promising so far. The plan is as follows:

   1. Get https://github.com/apache/incubator-mxnet/pull/10108 approved to
   allow self-merge – CI can’t pass until slaves have been upgraded.
   2. Replace all existing slaves with new upgraded slaves.
   3. Retrigger https://github.com/apache/incubator-mxnet/pull/10108 to
   merge necessary changes into master.

IMPORTANT: The migration will happen tomorrow, so please expect some delay
in job execution - the CI website will be unaffected. Ideally, no jobs
should fail - in case they do, please feel free to retrigger them by using
an empty commit. In case of any errors appearing after the upgrade, don't
hesitate to contact me!

Best regards,
Marco

On Tue, Mar 20, 2018 at 1:39 AM, Naveen Swamy <[email protected]> wrote:

> Yes, for short-term.
>
> On Monday, March 19, 2018, Chris Olivier <[email protected]> wrote:
>
> > In the short ter, Naveen, are you ok with Linux running CUDA 9 and
> Windows
> > CUDA 8 in order to get CUDA version coverage?
> >
> > On 2018/03/16 21:09:09, Marco de Abreu <[email protected]>
> > wrote:
> > > Thanks for your input. How would you propose to proceed in terms of a
> > > timeline in case this vote succeedes? I don't really have time to work
> > on a
> > > nightly setup right now. Would anybody in the community be able to help
> > me
> > > out here or shall we wait with the migration until a nightly setup for
> > CUDA
> > > 8 is up?
> > >
> > > -Marco
> > >
> > > On Fri, Mar 16, 2018 at 9:55 PM, Bhavin Thaker <[email protected]
> >
> > > wrote:
> > >
> > > > +1 to the suggestion of testing CUDA8 in few nightly instances and
> > using
> > > > CUDA9 for most instances in CI.
> > > >
> > > > Bhavin Thaker.
> > > >
> > > > On Fri, Mar 16, 2018 at 12:37 PM Naveen Swamy <[email protected]>
> > wrote:
> > > >
> > > > > I think its best to add support for CUDA 9.0 while retaining
> existing
> > > > > support for CUDA 8, code might regress when you remove and create
> > more
> > > > work
> > > > > to add CUDA 8 support back.
> > > > >
> > > > > On Fri, Mar 16, 2018 at 9:29 AM, Marco de Abreu <
> > > > > [email protected]> wrote:
> > > > >
> > > > > > Yeah, sorry Chris, mixed up the names.
> > > > > >
> > > > > > @Naveen: Would you be fine with doing the switch now and adding
> > > > > integration
> > > > > > tests later or is this a hard constraint for you?
> > > > > >
> > > > > > On Wed, Mar 14, 2018 at 6:39 PM, Chris Olivier <
> > [email protected]>
> > > > > > wrote:
> > > > > >
> > > > > > > Isn't the TItan V the Volta and not the Tesla?
> > > > > > >
> > > > > > > On Wed, Mar 14, 2018 at 10:36 AM, Naveen Swamy <
> > [email protected]>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Marco,
> > > > > > > > My -1 vote is for dropping support to CUDA 8 and not for
> adding
> > > > CUDA
> > > > > 9.
> > > > > > > > CUDA 9.0 support for MXNet was added Oct'30-2017, I think
> that
> > all
> > > > > > users
> > > > > > > > might not have switched to CUDA 9.0
> > > > > > > >
> > > > > > > > Look at the earlier discussion on the same topic
> > > > > > > >
> > > > > > > > https://lists.apache.org/thread.html/
> > > > 27b84e4fc0e0728f2e4ad8b6827d7f
> > > > > > > > 996635021a5a4d47b5d3f4dbfb@%3Cdev.mxnet.apache.org%3E
> > > > > > > >
> > > > > > > > On Wed, Mar 14, 2018 at 10:14 AM, Marco de Abreu <
> > > > > > > > [email protected]> wrote:
> > > > > > > >
> > > > > > > > > Right, the code changes would not be validated against CUDA
> > 8.0
> > > > as
> > > > > > part
> > > > > > > > of
> > > > > > > > > the PR process.
> > > > > > > > >
> > > > > > > > > I don't have any numbers, but it's pretty unlikely that
> > anybody
> > > > is
> > > > > > > still
> > > > > > > > > using CUDA 8.0. According to
> > > > > > > > > https://en.wikipedia.org/wiki/CUDA#GPUs_supported, the
> > devices
> > > > > which
> > > > > > > are
> > > > > > > > > not being supported by CUDA 9 are under the Fermi
> > architecture
> > > > > which
> > > > > > > has
> > > > > > > > > been released in April 2010. These GPUs are way too old,
> so I
> > > > think
> > > > > > > we're
> > > > > > > > > safe with not covering them specifically - this does not
> mean
> > > > we're
> > > > > > > > > entirely deprecating them.
> > > > > > > > >
> > > > > > > > > One thing to note here is that we're not testing CUDA 9 as
> of
> > > > now.
> > > > > > > > > Considering that the Telsa architecture (Titan V, V100)
> > requires
> > > > at
> > > > > > > least
> > > > > > > > > CUDA 9 and those are probably the most widely used GPUs for
> > Deep
> > > > > > > > Learning,
> > > > > > > > > we'd probably be covering a wider user-base in comparison
> to
> > > > CUDA 8
> > > > > > if
> > > > > > > we
> > > > > > > > > make that switch.
> > > > > > > > >
> > > > > > > > > -Marco
> > > > > > > > >
> > > > > > > > > On Wed, Mar 14, 2018 at 5:59 PM, Naveen Swamy <
> > > > [email protected]>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Does this mean that MXNet Users who use CUDA 8.0 will not
> > be
> > > > > > > > > > supported(since you are stopping to test CUDA 8.0) ? I
> > suggest
> > > > we
> > > > > > at
> > > > > > > > > least
> > > > > > > > > > have nightly tests for CUDA 8.0.
> > > > > > > > > >
> > > > > > > > > > Do you have a sense of how many users are using CUDA
> > 8.0/9.0 ?
> > > > > > > > > >
> > > > > > > > > > -1
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Wed, Mar 14, 2018 at 9:50 AM, Chris Olivier <
> > > > > > > [email protected]>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > +0
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Mar 14, 2018 at 9:45 AM, Jin, Hao <
> > [email protected]>
> > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > +1
> > > > > > > > > > > >
> > > > > > > > > > > > On 3/14/18, 9:04 AM, "Anirudh" <
> [email protected]
> > >
> > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > >     +1
> > > > > > > > > > > >
> > > > > > > > > > > >     On Mar 14, 2018 8:56 AM, "Wu, Jun" <
> > [email protected]>
> > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > >     > +1
> > > > > > > > > > > >     >
> > > > > > > > > > > >     > On 3/14/18, 8:52 AM, "Marco de Abreu" <
> > > > > > > > > > > [email protected]>
> > > > > > > > > > > >     > wrote:
> > > > > > > > > > > >     >
> > > > > > > > > > > >     >     Hello,
> > > > > > > > > > > >     >
> > > > > > > > > > > >     >     this is a vote to upgrade our CI
> environment
> > from
> > > > > the
> > > > > > > > > current
> > > > > > > > > > > > CUDA 8.0
> > > > > > > > > > > >     > with
> > > > > > > > > > > >     >     CuDNN 5.0 to CUDA 9.1 with CuDNN 7.0.
> Reason
> > > > being
> > > > > > that
> > > > > > > > > NVCC
> > > > > > > > > > > > under
> > > > > > > > > > > >     > CUDA 8
> > > > > > > > > > > >     >     does not support the Volta GPUs used in AWS
> > P3
> > > > > > > instances
> > > > > > > > > and
> > > > > > > > > > > thus
> > > > > > > > > > > >     > limiting
> > > > > > > > > > > >     >     our test capabilities. More details are
> > available
> > > > > at
> > > > > > > [1].
> > > > > > > > > > > >     >
> > > > > > > > > > > >     >     In order to introduce support for
> > Quantization
> > > > [1],
> > > > > > I'd
> > > > > > > > > like
> > > > > > > > > > to
> > > > > > > > > > > >     > perform a
> > > > > > > > > > > >     >     system-wide upgrade. This should have no
> > negative
> > > > > > > impact
> > > > > > > > in
> > > > > > > > > > our
> > > > > > > > > > > > users
> > > > > > > > > > > >     > but
> > > > > > > > > > > >     >     rather makes sure that we're actually
> testing
> > > > with
> > > > > > the
> > > > > > > > > latest
> > > > > > > > > > > >     > versions. The
> > > > > > > > > > > >     >     PR is available at [3].
> > > > > > > > > > > >     >
> > > > > > > > > > > >     >     This means that we would stop verifying
> CUDA
> > 8
> > > > and
> > > > > > > CuDNN
> > > > > > > > > 5.0
> > > > > > > > > > as
> > > > > > > > > > > > part
> > > > > > > > > > > >     > of our
> > > > > > > > > > > >     >     PR process. At a later point in time, this
> > could
> > > > be
> > > > > > > > picked
> > > > > > > > > up
> > > > > > > > > > > as
> > > > > > > > > > > > a
> > > > > > > > > > > >     >     candidate for an integration test as part
> of
> > the
> > > > > > > nightly
> > > > > > > > > > suite.
> > > > > > > > > > > >     >
> > > > > > > > > > > >     >     This is a lazy vote, ending on 17th of
> March,
> > > > 2018
> > > > > at
> > > > > > > > 17:00
> > > > > > > > > > > (UTC
> > > > > > > > > > > > +1).
> > > > > > > > > > > >     >
> > > > > > > > > > > >     >     Best regards,
> > > > > > > > > > > >     >     Marco
> > > > > > > > > > > >     >
> > > > > > > > > > > >     >
> > > > > > > > > > > >     >     [1]:
> > > > > https://issues.apache.org/jira/browse/MXNET-99
> > > > > > > > > > > >     >     [2]: https://github.com/apache/
> > > > > > > incubator-mxnet/pull/9552
> > > > > > > > > > > >     >     [3]: https://github.com/apache/
> > > > > > > > incubator-mxnet/pull/10108
> > > > > > > > > > > >     >
> > > > > > > > > > > >     >
> > > > > > > > > > > >     >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

[LAZY VOTE][RESULT] Upgrade CI to CUDA 9.1 with CuDNN 7.0

Reply via email to