Re: [LAZY VOTE][RESULT] Upgrade CI to CUDA 9.1 with CuDNN 7.0

Marco de Abreu Thu, 17 May 2018 06:58:53 -0700

Hello Haibin,

I'd love to see CUDA 8 back in CI, but we're currently lacking people to do
this properly (besides just copy&pasting the job). Since we agreed on only
supporting the last 2 CUDA major versions, we don't have to verify CUDA 7.


The way to go forward is to have things like these in the nightly test
cycle. At the moment, we don't have to manpower to maintain and improve
that suite, so we'll have to wait until we got more people or somebody is
willing to take this on themselves. I'd be happy to support volunteers here!

Best regards,
Marco

On Thu, May 17, 2018 at 7:56 AM, Haibin Lin <[email protected]>
wrote:

> Is there a plan for adding those CUDA 8 tests back to CI? What about CUDA
> 7?
>
> There were a few build problems in the past few weeks due to lack of CI
> coverage:
> - https://github.com/apache/incubator-mxnet/pull/10710 were found during
> 1.2 rc voting
> - https://github.com/apache/incubator-mxnet/issues/10981 were reported by
> an user with CUDA 7
>
> Having these covered in CI will help catch the issues early. I don't recall
> if we decided to drop CUDA 7 support for MXNet.
>
> Best,
> Haibin
>
> On Wed, Mar 21, 2018 at 6:32 AM, Marco de Abreu <
> [email protected]> wrote:
>
> > Hello,
> >
> > the migration has just been completed and we're now running our UNIX
> based
> > slaves on CUDA 9.1 with CuDNN 7. The commit is available at
> > https://github.com/apache/incubator-mxnet/commit/
> > b0a6760efa141aeca87b03ecf34dae924bd1af46
> > .
> >
> > No jobs have been interrupted by this migration. If you encounter any
> > errors, please reach back to me.
> >
> > Best regards,
> > Marco
> >
> > On Tue, Mar 20, 2018 at 11:20 PM, Marco de Abreu <
> > [email protected]> wrote:
> >
> > > Hello,
> > >
> > > the results of this vote are as follows:
> > >
> > > +1:
> > > Jun
> > > Anirudh
> > > Hao
> > > Marco
> > >
> > > 0:
> > > Chris
> > >
> > > -1:
> > > Naveen (veto recalled as of https://lists.apache.org/thread.html/
> > > 242db72a0c96349ef6e0ff1d3b1fe0dc7f7a9082532724c3293666c5@%
> > > 3Cdev.mxnet.apache.org%3E)
> > >
> > > Under the constraint that we will use CUDA 8 on Windows and CUDA 9.1 on
> > > UNIX slaves and work on integration tests for CUDA 8 in the long term,
> > this
> > > vote counts as PASSED.
> > >
> > > The PR for this change is available at https://github.com/apache/
> > > incubator-mxnet/pull/10108. I have developed and tested the new slaves
> in
> > > our test environment and everything looks promising so far. The plan is
> > as
> > > follows:
> > >
> > >    1. Get https://github.com/apache/incubator-mxnet/pull/10108
> approved
> > >    to allow self-merge – CI can’t pass until slaves have been upgraded.
> > >    2. Replace all existing slaves with new upgraded slaves.
> > >    3. Retrigger https://github.com/apache/incubator-mxnet/pull/10108
> to
> > >    merge necessary changes into master.
> > >
> > > IMPORTANT: The migration will happen tomorrow, so please expect some
> > delay
> > > in job execution - the CI website will be unaffected. Ideally, no jobs
> > > should fail - in case they do, please feel free to retrigger them by
> > using
> > > an empty commit. In case of any errors appearing after the upgrade,
> don't
> > > hesitate to contact me!
> > >
> > > Best regards,
> > > Marco
> > >
> > >
> > > On Tue, Mar 20, 2018 at 1:39 AM, Naveen Swamy <[email protected]>
> > wrote:
> > >
> > >> Yes, for short-term.
> > >>
> > >> On Monday, March 19, 2018, Chris Olivier <[email protected]>
> > wrote:
> > >>
> > >> > In the short ter, Naveen, are you ok with Linux running CUDA 9 and
> > >> Windows
> > >> > CUDA 8 in order to get CUDA version coverage?
> > >> >
> > >> > On 2018/03/16 21:09:09, Marco de Abreu <
> [email protected]>
> > >> > wrote:
> > >> > > Thanks for your input. How would you propose to proceed in terms
> of
> > a
> > >> > > timeline in case this vote succeedes? I don't really have time to
> > work
> > >> > on a
> > >> > > nightly setup right now. Would anybody in the community be able to
> > >> help
> > >> > me
> > >> > > out here or shall we wait with the migration until a nightly setup
> > for
> > >> > CUDA
> > >> > > 8 is up?
> > >> > >
> > >> > > -Marco
> > >> > >
> > >> > > On Fri, Mar 16, 2018 at 9:55 PM, Bhavin Thaker <
> > >> [email protected]>
> > >> > > wrote:
> > >> > >
> > >> > > > +1 to the suggestion of testing CUDA8 in few nightly instances
> and
> > >> > using
> > >> > > > CUDA9 for most instances in CI.
> > >> > > >
> > >> > > > Bhavin Thaker.
> > >> > > >
> > >> > > > On Fri, Mar 16, 2018 at 12:37 PM Naveen Swamy <
> [email protected]
> > >
> > >> > wrote:
> > >> > > >
> > >> > > > > I think its best to add support for CUDA 9.0 while retaining
> > >> existing
> > >> > > > > support for CUDA 8, code might regress when you remove and
> > create
> > >> > more
> > >> > > > work
> > >> > > > > to add CUDA 8 support back.
> > >> > > > >
> > >> > > > > On Fri, Mar 16, 2018 at 9:29 AM, Marco de Abreu <
> > >> > > > > [email protected]> wrote:
> > >> > > > >
> > >> > > > > > Yeah, sorry Chris, mixed up the names.
> > >> > > > > >
> > >> > > > > > @Naveen: Would you be fine with doing the switch now and
> > adding
> > >> > > > > integration
> > >> > > > > > tests later or is this a hard constraint for you?
> > >> > > > > >
> > >> > > > > > On Wed, Mar 14, 2018 at 6:39 PM, Chris Olivier <
> > >> > [email protected]>
> > >> > > > > > wrote:
> > >> > > > > >
> > >> > > > > > > Isn't the TItan V the Volta and not the Tesla?
> > >> > > > > > >
> > >> > > > > > > On Wed, Mar 14, 2018 at 10:36 AM, Naveen Swamy <
> > >> > [email protected]>
> > >> > > > > > wrote:
> > >> > > > > > >
> > >> > > > > > > > Marco,
> > >> > > > > > > > My -1 vote is for dropping support to CUDA 8 and not for
> > >> adding
> > >> > > > CUDA
> > >> > > > > 9.
> > >> > > > > > > > CUDA 9.0 support for MXNet was added Oct'30-2017, I
> think
> > >> that
> > >> > all
> > >> > > > > > users
> > >> > > > > > > > might not have switched to CUDA 9.0
> > >> > > > > > > >
> > >> > > > > > > > Look at the earlier discussion on the same topic
> > >> > > > > > > >
> > >> > > > > > > > https://lists.apache.org/thread.html/
> > >> > > > 27b84e4fc0e0728f2e4ad8b6827d7f
> > >> > > > > > > > 996635021a5a4d47b5d3f4dbfb@%3Cdev.mxnet.apache.org%3E
> > >> > > > > > > >
> > >> > > > > > > > On Wed, Mar 14, 2018 at 10:14 AM, Marco de Abreu <
> > >> > > > > > > > [email protected]> wrote:
> > >> > > > > > > >
> > >> > > > > > > > > Right, the code changes would not be validated against
> > >> CUDA
> > >> > 8.0
> > >> > > > as
> > >> > > > > > part
> > >> > > > > > > > of
> > >> > > > > > > > > the PR process.
> > >> > > > > > > > >
> > >> > > > > > > > > I don't have any numbers, but it's pretty unlikely
> that
> > >> > anybody
> > >> > > > is
> > >> > > > > > > still
> > >> > > > > > > > > using CUDA 8.0. According to
> > >> > > > > > > > > https://en.wikipedia.org/wiki/CUDA#GPUs_supported,
> the
> > >> > devices
> > >> > > > > which
> > >> > > > > > > are
> > >> > > > > > > > > not being supported by CUDA 9 are under the Fermi
> > >> > architecture
> > >> > > > > which
> > >> > > > > > > has
> > >> > > > > > > > > been released in April 2010. These GPUs are way too
> old,
> > >> so I
> > >> > > > think
> > >> > > > > > > we're
> > >> > > > > > > > > safe with not covering them specifically - this does
> not
> > >> mean
> > >> > > > we're
> > >> > > > > > > > > entirely deprecating them.
> > >> > > > > > > > >
> > >> > > > > > > > > One thing to note here is that we're not testing CUDA
> 9
> > >> as of
> > >> > > > now.
> > >> > > > > > > > > Considering that the Telsa architecture (Titan V,
> V100)
> > >> > requires
> > >> > > > at
> > >> > > > > > > least
> > >> > > > > > > > > CUDA 9 and those are probably the most widely used
> GPUs
> > >> for
> > >> > Deep
> > >> > > > > > > > Learning,
> > >> > > > > > > > > we'd probably be covering a wider user-base in
> > comparison
> > >> to
> > >> > > > CUDA 8
> > >> > > > > > if
> > >> > > > > > > we
> > >> > > > > > > > > make that switch.
> > >> > > > > > > > >
> > >> > > > > > > > > -Marco
> > >> > > > > > > > >
> > >> > > > > > > > > On Wed, Mar 14, 2018 at 5:59 PM, Naveen Swamy <
> > >> > > > [email protected]>
> > >> > > > > > > > wrote:
> > >> > > > > > > > >
> > >> > > > > > > > > > Does this mean that MXNet Users who use CUDA 8.0
> will
> > >> not
> > >> > be
> > >> > > > > > > > > > supported(since you are stopping to test CUDA 8.0)
> ? I
> > >> > suggest
> > >> > > > we
> > >> > > > > > at
> > >> > > > > > > > > least
> > >> > > > > > > > > > have nightly tests for CUDA 8.0.
> > >> > > > > > > > > >
> > >> > > > > > > > > > Do you have a sense of how many users are using CUDA
> > >> > 8.0/9.0 ?
> > >> > > > > > > > > >
> > >> > > > > > > > > > -1
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > > > On Wed, Mar 14, 2018 at 9:50 AM, Chris Olivier <
> > >> > > > > > > [email protected]>
> > >> > > > > > > > > > wrote:
> > >> > > > > > > > > >
> > >> > > > > > > > > > > +0
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > On Wed, Mar 14, 2018 at 9:45 AM, Jin, Hao <
> > >> > [email protected]>
> > >> > > > > > wrote:
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > > +1
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > On 3/14/18, 9:04 AM, "Anirudh" <
> > >> [email protected]
> > >> > >
> > >> > > > > wrote:
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >     +1
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >     On Mar 14, 2018 8:56 AM, "Wu, Jun" <
> > >> > [email protected]>
> > >> > > > > > wrote:
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >     > +1
> > >> > > > > > > > > > > >     >
> > >> > > > > > > > > > > >     > On 3/14/18, 8:52 AM, "Marco de Abreu" <
> > >> > > > > > > > > > > [email protected]>
> > >> > > > > > > > > > > >     > wrote:
> > >> > > > > > > > > > > >     >
> > >> > > > > > > > > > > >     >     Hello,
> > >> > > > > > > > > > > >     >
> > >> > > > > > > > > > > >     >     this is a vote to upgrade our CI
> > >> environment
> > >> > from
> > >> > > > > the
> > >> > > > > > > > > current
> > >> > > > > > > > > > > > CUDA 8.0
> > >> > > > > > > > > > > >     > with
> > >> > > > > > > > > > > >     >     CuDNN 5.0 to CUDA 9.1 with CuDNN 7.0.
> > >> Reason
> > >> > > > being
> > >> > > > > > that
> > >> > > > > > > > > NVCC
> > >> > > > > > > > > > > > under
> > >> > > > > > > > > > > >     > CUDA 8
> > >> > > > > > > > > > > >     >     does not support the Volta GPUs used
> in
> > >> AWS
> > >> > P3
> > >> > > > > > > instances
> > >> > > > > > > > > and
> > >> > > > > > > > > > > thus
> > >> > > > > > > > > > > >     > limiting
> > >> > > > > > > > > > > >     >     our test capabilities. More details
> are
> > >> > available
> > >> > > > > at
> > >> > > > > > > [1].
> > >> > > > > > > > > > > >     >
> > >> > > > > > > > > > > >     >     In order to introduce support for
> > >> > Quantization
> > >> > > > [1],
> > >> > > > > > I'd
> > >> > > > > > > > > like
> > >> > > > > > > > > > to
> > >> > > > > > > > > > > >     > perform a
> > >> > > > > > > > > > > >     >     system-wide upgrade. This should have
> no
> > >> > negative
> > >> > > > > > > impact
> > >> > > > > > > > in
> > >> > > > > > > > > > our
> > >> > > > > > > > > > > > users
> > >> > > > > > > > > > > >     > but
> > >> > > > > > > > > > > >     >     rather makes sure that we're actually
> > >> testing
> > >> > > > with
> > >> > > > > > the
> > >> > > > > > > > > latest
> > >> > > > > > > > > > > >     > versions. The
> > >> > > > > > > > > > > >     >     PR is available at [3].
> > >> > > > > > > > > > > >     >
> > >> > > > > > > > > > > >     >     This means that we would stop
> verifying
> > >> CUDA
> > >> > 8
> > >> > > > and
> > >> > > > > > > CuDNN
> > >> > > > > > > > > 5.0
> > >> > > > > > > > > > as
> > >> > > > > > > > > > > > part
> > >> > > > > > > > > > > >     > of our
> > >> > > > > > > > > > > >     >     PR process. At a later point in time,
> > this
> > >> > could
> > >> > > > be
> > >> > > > > > > > picked
> > >> > > > > > > > > up
> > >> > > > > > > > > > > as
> > >> > > > > > > > > > > > a
> > >> > > > > > > > > > > >     >     candidate for an integration test as
> > part
> > >> of
> > >> > the
> > >> > > > > > > nightly
> > >> > > > > > > > > > suite.
> > >> > > > > > > > > > > >     >
> > >> > > > > > > > > > > >     >     This is a lazy vote, ending on 17th of
> > >> March,
> > >> > > > 2018
> > >> > > > > at
> > >> > > > > > > > 17:00
> > >> > > > > > > > > > > (UTC
> > >> > > > > > > > > > > > +1).
> > >> > > > > > > > > > > >     >
> > >> > > > > > > > > > > >     >     Best regards,
> > >> > > > > > > > > > > >     >     Marco
> > >> > > > > > > > > > > >     >
> > >> > > > > > > > > > > >     >
> > >> > > > > > > > > > > >     >     [1]:
> > >> > > > > https://issues.apache.org/jira/browse/MXNET-99
> > >> > > > > > > > > > > >     >     [2]: https://github.com/apache/
> > >> > > > > > > incubator-mxnet/pull/9552
> > >> > > > > > > > > > > >     >     [3]: https://github.com/apache/
> > >> > > > > > > > incubator-mxnet/pull/10108
> > >> > > > > > > > > > > >     >
> > >> > > > > > > > > > > >     >
> > >> > > > > > > > > > > >     >
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: [LAZY VOTE][RESULT] Upgrade CI to CUDA 9.1 with CuDNN 7.0

Reply via email to