Hello dev@,

I would like to ask around whether there is interest in the community to
test nightly builds of MXNet with third-party packages that depend on MXNet
and act as early adopters. The goal is to catch regressions in MXNet early,
allowing time for bug fixes before a new release is cut.

For example, Sockeye <https://github.com/awslabs/sockeye> is a customer of
new MXNet releases and aims to upgrade to latest MXNet as soon as possible.
Typically, we update our dependency on MXNet once a new release becomes
available (through pip). However, there have been cases where new releases
of MXNet introduced regressions undetected by MXNet tests (hence passing
the release process): the latest example is this issue
<https://github.com/apache/incubator-mxnet/issues/13862>, which may have
been introduced already back in October, but, due to infrequent MXNet
releases, has only surfaced recently and will most likely force us to wait
for a post or 1.4.1 release. In this particular example, Sockeye’s tests
would have detected this, and the issue could have been created already in
October, potentially avoiding its presence in the 1.4.0 release.

More generally, I think there are several third-party packages with
valuable test suites (e.g. gluon-nlp) that can contribute to catching MXNet
regressions or incompatibilities early. Running these test suites for each
and every PR or commit on the MXNet main repo would be too much overhead.
My proposal would be to trigger these tests with the nightly builds (pip
releases) of MXNet in a separate CI pipeline that is able to notify the 3p
maintainers in a case of failure, but does not block MXNet development (or
nightly build releases) in any way.

Roughly it would do the following:

   - pip install mxnet--<date>
   - for each 3p package that is part of the pipeline:
      - clone/setup up package
      - run unit/integration tests of package with some timeout
      - in case of failure, notify package owner

I am not familiar with the current CI pipelines, their requirements and
resources. It would be great if someone from the CI team could chime in and
evaluate whether such a proposal seems doable and worthwhile.



Reply via email to