Great idea Marco! Anything that you think would be valuable to share would
be good. The duration of each node in the test stage sounds like a good
start.

- Carin

On Wed, Aug 14, 2019 at 2:48 PM Marco de Abreu <marco.g.ab...@gmail.com>
wrote:

> Hi,
>
> we record a bunch of metrics about run statistics (down to the duration of
> every individual step). If you tell me which ones you're particularly
> interested in (probably total duration of each node in the test stage), I'm
> happy to provide them.
>
> Dimensions are (in hierarchical order):
> - job
> - branch
> - stage
> - node
> - step
>
> Unfortunately I don't have the possibility to export them since we store
> them in CloudWatch Metrics which afaik doesn't offer raw exports.
>
> Best regards,
> Marco
>
> Carin Meier <carinme...@gmail.com> schrieb am Mi., 14. Aug. 2019, 19:43:
>
> > I would prefer to keep the language binding in the PR process. Perhaps we
> > could do some analytics to see how much each of the language bindings is
> > contributing to overall run time.
> > If we have some metrics on that, maybe we can come up with a guideline of
> > how much time each should take. Another possibility is leverage the
> > parallel builds more.
> >
> > On Wed, Aug 14, 2019 at 1:30 PM Pedro Larroy <
> pedro.larroy.li...@gmail.com
> > >
> > wrote:
> >
> > > Hi Carin.
> > >
> > > That's a good point, all things considered would your preference be to
> > keep
> > > the Clojure tests as part of the PR process or in Nightly?
> > > Some options are having notifications here or in slack. But if we think
> > > breakages would go unnoticed maybe is not a good idea to fully remove
> > > bindings from the PR process and just streamline the process.
> > >
> > > Pedro.
> > >
> > > On Wed, Aug 14, 2019 at 5:09 AM Carin Meier <carinme...@gmail.com>
> > wrote:
> > >
> > > > Before any binding tests are moved to nightly, I think we need to
> > figure
> > > > out how the community can get proper notifications of failure and
> > success
> > > > on those nightly runs. Otherwise, I think that breakages would go
> > > > unnoticed.
> > > >
> > > > -Carin
> > > >
> > > > On Tue, Aug 13, 2019 at 7:47 PM Pedro Larroy <
> > > pedro.larroy.li...@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > Hi
> > > > >
> > > > > Seems we are hitting some problems in CI. I propose the following
> > > action
> > > > > items to remedy the situation and accelerate turn around times in
> CI,
> > > > > reduce cost, complexity and probability of failure blocking PRs and
> > > > > frustrating developers:
> > > > >
> > > > > * Upgrade Windows visual studio from VS 2015 to VS 2017. The
> > > > > build_windows.py infrastructure should easily work with the new
> > > version.
> > > > > Currently some PRs are blocked by this:
> > > > > https://github.com/apache/incubator-mxnet/issues/13958
> > > > > * Move Gluon Model zoo tests to nightly. Tracked at
> > > > > https://github.com/apache/incubator-mxnet/issues/15295
> > > > > * Move non-python bindings tests to nightly. If a commit is
> touching
> > > > other
> > > > > bindings, the reviewer should ask for a full run which can be done
> > > > locally,
> > > > > use the label bot to trigger a full CI build, or defer to nightly.
> > > > > * Provide a couple of basic sanity performance tests on small
> models
> > > that
> > > > > are run on CI and can be echoed by the label bot as a comment for
> > PRs.
> > > > > * Address unit tests that take more than 10-20s, streamline them or
> > > move
> > > > > them to nightly if it can't be done.
> > > > > * Open sourcing the remaining CI infrastructure scripts so the
> > > community
> > > > > can contribute.
> > > > >
> > > > > I think our goal should be turnaround under 30min.
> > > > >
> > > > > I would also like to touch base with the community that some PRs
> are
> > > not
> > > > > being followed up by committers asking for changes. For example
> this
> > PR
> > > > is
> > > > > importtant and is hanging for a long time.
> > > > >
> > > > > https://github.com/apache/incubator-mxnet/pull/15051
> > > > >
> > > > > This is another, less important but more trivial to review:
> > > > >
> > > > > https://github.com/apache/incubator-mxnet/pull/14940
> > > > >
> > > > > I think comitters requesting changes and not folllowing up in
> > > reasonable
> > > > > time is not healthy for the project. I suggest configuring github
> > > > > Notifications for a good SNR and following up.
> > > > >
> > > > > Regards.
> > > > >
> > > > > Pedro.
> > > > >
> > > >
> > >
> >
>

Reply via email to