Great idea Marco! Anything that you think would be valuable to share would be good. The duration of each node in the test stage sounds like a good start.
- Carin On Wed, Aug 14, 2019 at 2:48 PM Marco de Abreu <marco.g.ab...@gmail.com> wrote: > Hi, > > we record a bunch of metrics about run statistics (down to the duration of > every individual step). If you tell me which ones you're particularly > interested in (probably total duration of each node in the test stage), I'm > happy to provide them. > > Dimensions are (in hierarchical order): > - job > - branch > - stage > - node > - step > > Unfortunately I don't have the possibility to export them since we store > them in CloudWatch Metrics which afaik doesn't offer raw exports. > > Best regards, > Marco > > Carin Meier <carinme...@gmail.com> schrieb am Mi., 14. Aug. 2019, 19:43: > > > I would prefer to keep the language binding in the PR process. Perhaps we > > could do some analytics to see how much each of the language bindings is > > contributing to overall run time. > > If we have some metrics on that, maybe we can come up with a guideline of > > how much time each should take. Another possibility is leverage the > > parallel builds more. > > > > On Wed, Aug 14, 2019 at 1:30 PM Pedro Larroy < > pedro.larroy.li...@gmail.com > > > > > wrote: > > > > > Hi Carin. > > > > > > That's a good point, all things considered would your preference be to > > keep > > > the Clojure tests as part of the PR process or in Nightly? > > > Some options are having notifications here or in slack. But if we think > > > breakages would go unnoticed maybe is not a good idea to fully remove > > > bindings from the PR process and just streamline the process. > > > > > > Pedro. > > > > > > On Wed, Aug 14, 2019 at 5:09 AM Carin Meier <carinme...@gmail.com> > > wrote: > > > > > > > Before any binding tests are moved to nightly, I think we need to > > figure > > > > out how the community can get proper notifications of failure and > > success > > > > on those nightly runs. Otherwise, I think that breakages would go > > > > unnoticed. > > > > > > > > -Carin > > > > > > > > On Tue, Aug 13, 2019 at 7:47 PM Pedro Larroy < > > > pedro.larroy.li...@gmail.com > > > > > > > > > wrote: > > > > > > > > > Hi > > > > > > > > > > Seems we are hitting some problems in CI. I propose the following > > > action > > > > > items to remedy the situation and accelerate turn around times in > CI, > > > > > reduce cost, complexity and probability of failure blocking PRs and > > > > > frustrating developers: > > > > > > > > > > * Upgrade Windows visual studio from VS 2015 to VS 2017. The > > > > > build_windows.py infrastructure should easily work with the new > > > version. > > > > > Currently some PRs are blocked by this: > > > > > https://github.com/apache/incubator-mxnet/issues/13958 > > > > > * Move Gluon Model zoo tests to nightly. Tracked at > > > > > https://github.com/apache/incubator-mxnet/issues/15295 > > > > > * Move non-python bindings tests to nightly. If a commit is > touching > > > > other > > > > > bindings, the reviewer should ask for a full run which can be done > > > > locally, > > > > > use the label bot to trigger a full CI build, or defer to nightly. > > > > > * Provide a couple of basic sanity performance tests on small > models > > > that > > > > > are run on CI and can be echoed by the label bot as a comment for > > PRs. > > > > > * Address unit tests that take more than 10-20s, streamline them or > > > move > > > > > them to nightly if it can't be done. > > > > > * Open sourcing the remaining CI infrastructure scripts so the > > > community > > > > > can contribute. > > > > > > > > > > I think our goal should be turnaround under 30min. > > > > > > > > > > I would also like to touch base with the community that some PRs > are > > > not > > > > > being followed up by committers asking for changes. For example > this > > PR > > > > is > > > > > importtant and is hanging for a long time. > > > > > > > > > > https://github.com/apache/incubator-mxnet/pull/15051 > > > > > > > > > > This is another, less important but more trivial to review: > > > > > > > > > > https://github.com/apache/incubator-mxnet/pull/14940 > > > > > > > > > > I think comitters requesting changes and not folllowing up in > > > reasonable > > > > > time is not healthy for the project. I suggest configuring github > > > > > Notifications for a good SNR and following up. > > > > > > > > > > Regards. > > > > > > > > > > Pedro. > > > > > > > > > > > > > > >