Re: [LAZY VOTE][RESULT] Upgrade CI to CUDA 9.1 with CuDNN 7.0

Haibin Lin Wed, 16 May 2018 22:57:07 -0700

Is there a plan for adding those CUDA 8 tests back to CI? What about CUDA 7?


There were a few build problems in the past few weeks due to lack of CI
coverage:
- https://github.com/apache/incubator-mxnet/pull/10710 were found during
1.2 rc voting
- https://github.com/apache/incubator-mxnet/issues/10981 were reported by
an user with CUDA 7

Having these covered in CI will help catch the issues early. I don't recall
if we decided to drop CUDA 7 support for MXNet.

Best,
Haibin

On Wed, Mar 21, 2018 at 6:32 AM, Marco de Abreu <
[email protected]> wrote:

> Hello,
>
> the migration has just been completed and we're now running our UNIX based
> slaves on CUDA 9.1 with CuDNN 7. The commit is available at
> https://github.com/apache/incubator-mxnet/commit/
> b0a6760efa141aeca87b03ecf34dae924bd1af46
> .
>
> No jobs have been interrupted by this migration. If you encounter any
> errors, please reach back to me.
>
> Best regards,
> Marco
>
> On Tue, Mar 20, 2018 at 11:20 PM, Marco de Abreu <
> [email protected]> wrote:
>
> > Hello,
> >
> > the results of this vote are as follows:
> >
> > +1:
> > Jun
> > Anirudh
> > Hao
> > Marco
> >
> > 0:
> > Chris
> >
> > -1:
> > Naveen (veto recalled as of https://lists.apache.org/thread.html/
> > 242db72a0c96349ef6e0ff1d3b1fe0dc7f7a9082532724c3293666c5@%
> > 3Cdev.mxnet.apache.org%3E)
> >
> > Under the constraint that we will use CUDA 8 on Windows and CUDA 9.1 on
> > UNIX slaves and work on integration tests for CUDA 8 in the long term,
> this
> > vote counts as PASSED.
> >
> > The PR for this change is available at https://github.com/apache/
> > incubator-mxnet/pull/10108. I have developed and tested the new slaves in
> > our test environment and everything looks promising so far. The plan is
> as
> > follows:
> >
> >    1. Get https://github.com/apache/incubator-mxnet/pull/10108 approved
> >    to allow self-merge – CI can’t pass until slaves have been upgraded.
> >    2. Replace all existing slaves with new upgraded slaves.
> >    3. Retrigger https://github.com/apache/incubator-mxnet/pull/10108 to
> >    merge necessary changes into master.
> >
> > IMPORTANT: The migration will happen tomorrow, so please expect some
> delay
> > in job execution - the CI website will be unaffected. Ideally, no jobs
> > should fail - in case they do, please feel free to retrigger them by
> using
> > an empty commit. In case of any errors appearing after the upgrade, don't
> > hesitate to contact me!
> >
> > Best regards,
> > Marco
> >
> >
> > On Tue, Mar 20, 2018 at 1:39 AM, Naveen Swamy <[email protected]>
> wrote:
> >
> >> Yes, for short-term.
> >>
> >> On Monday, March 19, 2018, Chris Olivier <[email protected]>
> wrote:
> >>
> >> > In the short ter, Naveen, are you ok with Linux running CUDA 9 and
> >> Windows
> >> > CUDA 8 in order to get CUDA version coverage?
> >> >
> >> > On 2018/03/16 21:09:09, Marco de Abreu <[email protected]>
> >> > wrote:
> >> > > Thanks for your input. How would you propose to proceed in terms of
> a
> >> > > timeline in case this vote succeedes? I don't really have time to
> work
> >> > on a
> >> > > nightly setup right now. Would anybody in the community be able to
> >> help
> >> > me
> >> > > out here or shall we wait with the migration until a nightly setup
> for
> >> > CUDA
> >> > > 8 is up?
> >> > >
> >> > > -Marco
> >> > >
> >> > > On Fri, Mar 16, 2018 at 9:55 PM, Bhavin Thaker <
> >> [email protected]>
> >> > > wrote:
> >> > >
> >> > > > +1 to the suggestion of testing CUDA8 in few nightly instances and
> >> > using
> >> > > > CUDA9 for most instances in CI.
> >> > > >
> >> > > > Bhavin Thaker.
> >> > > >
> >> > > > On Fri, Mar 16, 2018 at 12:37 PM Naveen Swamy <[email protected]
> >
> >> > wrote:
> >> > > >
> >> > > > > I think its best to add support for CUDA 9.0 while retaining
> >> existing
> >> > > > > support for CUDA 8, code might regress when you remove and
> create
> >> > more
> >> > > > work
> >> > > > > to add CUDA 8 support back.
> >> > > > >
> >> > > > > On Fri, Mar 16, 2018 at 9:29 AM, Marco de Abreu <
> >> > > > > [email protected]> wrote:
> >> > > > >
> >> > > > > > Yeah, sorry Chris, mixed up the names.
> >> > > > > >
> >> > > > > > @Naveen: Would you be fine with doing the switch now and
> adding
> >> > > > > integration
> >> > > > > > tests later or is this a hard constraint for you?
> >> > > > > >
> >> > > > > > On Wed, Mar 14, 2018 at 6:39 PM, Chris Olivier <
> >> > [email protected]>
> >> > > > > > wrote:
> >> > > > > >
> >> > > > > > > Isn't the TItan V the Volta and not the Tesla?
> >> > > > > > >
> >> > > > > > > On Wed, Mar 14, 2018 at 10:36 AM, Naveen Swamy <
> >> > [email protected]>
> >> > > > > > wrote:
> >> > > > > > >
> >> > > > > > > > Marco,
> >> > > > > > > > My -1 vote is for dropping support to CUDA 8 and not for
> >> adding
> >> > > > CUDA
> >> > > > > 9.
> >> > > > > > > > CUDA 9.0 support for MXNet was added Oct'30-2017, I think
> >> that
> >> > all
> >> > > > > > users
> >> > > > > > > > might not have switched to CUDA 9.0
> >> > > > > > > >
> >> > > > > > > > Look at the earlier discussion on the same topic
> >> > > > > > > >
> >> > > > > > > > https://lists.apache.org/thread.html/
> >> > > > 27b84e4fc0e0728f2e4ad8b6827d7f
> >> > > > > > > > 996635021a5a4d47b5d3f4dbfb@%3Cdev.mxnet.apache.org%3E
> >> > > > > > > >
> >> > > > > > > > On Wed, Mar 14, 2018 at 10:14 AM, Marco de Abreu <
> >> > > > > > > > [email protected]> wrote:
> >> > > > > > > >
> >> > > > > > > > > Right, the code changes would not be validated against
> >> CUDA
> >> > 8.0
> >> > > > as
> >> > > > > > part
> >> > > > > > > > of
> >> > > > > > > > > the PR process.
> >> > > > > > > > >
> >> > > > > > > > > I don't have any numbers, but it's pretty unlikely that
> >> > anybody
> >> > > > is
> >> > > > > > > still
> >> > > > > > > > > using CUDA 8.0. According to
> >> > > > > > > > > https://en.wikipedia.org/wiki/CUDA#GPUs_supported, the
> >> > devices
> >> > > > > which
> >> > > > > > > are
> >> > > > > > > > > not being supported by CUDA 9 are under the Fermi
> >> > architecture
> >> > > > > which
> >> > > > > > > has
> >> > > > > > > > > been released in April 2010. These GPUs are way too old,
> >> so I
> >> > > > think
> >> > > > > > > we're
> >> > > > > > > > > safe with not covering them specifically - this does not
> >> mean
> >> > > > we're
> >> > > > > > > > > entirely deprecating them.
> >> > > > > > > > >
> >> > > > > > > > > One thing to note here is that we're not testing CUDA 9
> >> as of
> >> > > > now.
> >> > > > > > > > > Considering that the Telsa architecture (Titan V, V100)
> >> > requires
> >> > > > at
> >> > > > > > > least
> >> > > > > > > > > CUDA 9 and those are probably the most widely used GPUs
> >> for
> >> > Deep
> >> > > > > > > > Learning,
> >> > > > > > > > > we'd probably be covering a wider user-base in
> comparison
> >> to
> >> > > > CUDA 8
> >> > > > > > if
> >> > > > > > > we
> >> > > > > > > > > make that switch.
> >> > > > > > > > >
> >> > > > > > > > > -Marco
> >> > > > > > > > >
> >> > > > > > > > > On Wed, Mar 14, 2018 at 5:59 PM, Naveen Swamy <
> >> > > > [email protected]>
> >> > > > > > > > wrote:
> >> > > > > > > > >
> >> > > > > > > > > > Does this mean that MXNet Users who use CUDA 8.0 will
> >> not
> >> > be
> >> > > > > > > > > > supported(since you are stopping to test CUDA 8.0) ? I
> >> > suggest
> >> > > > we
> >> > > > > > at
> >> > > > > > > > > least
> >> > > > > > > > > > have nightly tests for CUDA 8.0.
> >> > > > > > > > > >
> >> > > > > > > > > > Do you have a sense of how many users are using CUDA
> >> > 8.0/9.0 ?
> >> > > > > > > > > >
> >> > > > > > > > > > -1
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > On Wed, Mar 14, 2018 at 9:50 AM, Chris Olivier <
> >> > > > > > > [email protected]>
> >> > > > > > > > > > wrote:
> >> > > > > > > > > >
> >> > > > > > > > > > > +0
> >> > > > > > > > > > >
> >> > > > > > > > > > > On Wed, Mar 14, 2018 at 9:45 AM, Jin, Hao <
> >> > [email protected]>
> >> > > > > > wrote:
> >> > > > > > > > > > >
> >> > > > > > > > > > > > +1
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > On 3/14/18, 9:04 AM, "Anirudh" <
> >> [email protected]
> >> > >
> >> > > > > wrote:
> >> > > > > > > > > > > >
> >> > > > > > > > > > > >     +1
> >> > > > > > > > > > > >
> >> > > > > > > > > > > >     On Mar 14, 2018 8:56 AM, "Wu, Jun" <
> >> > [email protected]>
> >> > > > > > wrote:
> >> > > > > > > > > > > >
> >> > > > > > > > > > > >     > +1
> >> > > > > > > > > > > >     >
> >> > > > > > > > > > > >     > On 3/14/18, 8:52 AM, "Marco de Abreu" <
> >> > > > > > > > > > > [email protected]>
> >> > > > > > > > > > > >     > wrote:
> >> > > > > > > > > > > >     >
> >> > > > > > > > > > > >     >     Hello,
> >> > > > > > > > > > > >     >
> >> > > > > > > > > > > >     >     this is a vote to upgrade our CI
> >> environment
> >> > from
> >> > > > > the
> >> > > > > > > > > current
> >> > > > > > > > > > > > CUDA 8.0
> >> > > > > > > > > > > >     > with
> >> > > > > > > > > > > >     >     CuDNN 5.0 to CUDA 9.1 with CuDNN 7.0.
> >> Reason
> >> > > > being
> >> > > > > > that
> >> > > > > > > > > NVCC
> >> > > > > > > > > > > > under
> >> > > > > > > > > > > >     > CUDA 8
> >> > > > > > > > > > > >     >     does not support the Volta GPUs used in
> >> AWS
> >> > P3
> >> > > > > > > instances
> >> > > > > > > > > and
> >> > > > > > > > > > > thus
> >> > > > > > > > > > > >     > limiting
> >> > > > > > > > > > > >     >     our test capabilities. More details are
> >> > available
> >> > > > > at
> >> > > > > > > [1].
> >> > > > > > > > > > > >     >
> >> > > > > > > > > > > >     >     In order to introduce support for
> >> > Quantization
> >> > > > [1],
> >> > > > > > I'd
> >> > > > > > > > > like
> >> > > > > > > > > > to
> >> > > > > > > > > > > >     > perform a
> >> > > > > > > > > > > >     >     system-wide upgrade. This should have no
> >> > negative
> >> > > > > > > impact
> >> > > > > > > > in
> >> > > > > > > > > > our
> >> > > > > > > > > > > > users
> >> > > > > > > > > > > >     > but
> >> > > > > > > > > > > >     >     rather makes sure that we're actually
> >> testing
> >> > > > with
> >> > > > > > the
> >> > > > > > > > > latest
> >> > > > > > > > > > > >     > versions. The
> >> > > > > > > > > > > >     >     PR is available at [3].
> >> > > > > > > > > > > >     >
> >> > > > > > > > > > > >     >     This means that we would stop verifying
> >> CUDA
> >> > 8
> >> > > > and
> >> > > > > > > CuDNN
> >> > > > > > > > > 5.0
> >> > > > > > > > > > as
> >> > > > > > > > > > > > part
> >> > > > > > > > > > > >     > of our
> >> > > > > > > > > > > >     >     PR process. At a later point in time,
> this
> >> > could
> >> > > > be
> >> > > > > > > > picked
> >> > > > > > > > > up
> >> > > > > > > > > > > as
> >> > > > > > > > > > > > a
> >> > > > > > > > > > > >     >     candidate for an integration test as
> part
> >> of
> >> > the
> >> > > > > > > nightly
> >> > > > > > > > > > suite.
> >> > > > > > > > > > > >     >
> >> > > > > > > > > > > >     >     This is a lazy vote, ending on 17th of
> >> March,
> >> > > > 2018
> >> > > > > at
> >> > > > > > > > 17:00
> >> > > > > > > > > > > (UTC
> >> > > > > > > > > > > > +1).
> >> > > > > > > > > > > >     >
> >> > > > > > > > > > > >     >     Best regards,
> >> > > > > > > > > > > >     >     Marco
> >> > > > > > > > > > > >     >
> >> > > > > > > > > > > >     >
> >> > > > > > > > > > > >     >     [1]:
> >> > > > > https://issues.apache.org/jira/browse/MXNET-99
> >> > > > > > > > > > > >     >     [2]: https://github.com/apache/
> >> > > > > > > incubator-mxnet/pull/9552
> >> > > > > > > > > > > >     >     [3]: https://github.com/apache/
> >> > > > > > > > incubator-mxnet/pull/10108
> >> > > > > > > > > > > >     >
> >> > > > > > > > > > > >     >
> >> > > > > > > > > > > >     >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: [LAZY VOTE][RESULT] Upgrade CI to CUDA 9.1 with CuDNN 7.0

Reply via email to