To follow up on the Jenkins Job Cacher Plugin: Using a Jenkins plugin to save and reuse the Gradle cache for successive precommit jobs. The problem with this approach is that the precommit runs that a Jenkins server runs are unrelated. Say you have 2 PRs, A and B, and the precommit job for B reuses the cache left by the job for A. The diff between the two will cause tests affected both by A and B to be rerun (at least). If A modifies Python code, then the job for B must rerun ALL Python tests (since Gradle doesn't do dependency tracking for Python).
Proposal: a. The cache plugin is still useful for successive Java precommit jobs, but not for Python. (Go, I have no idea) We could use it exclusively for Java precommits. b. To avoid running precommit jobs for code not touched by a PR, look at the paths of files changed. For example, a PR touching only files under sdks/python/... need only run Python precommit tests. On Tue, Jun 5, 2018 at 7:24 PM Udi Meiri <eh...@google.com> wrote: > I've been having a separate discussion on the proposal doc, which is ready > for another round of reviews. > Change summary: > - Changed fast requirement to be < 30 minutes and simplify the check as an > aggregate for each precommit job type. > - Updated slowness notification methods to include automated methods: as a > precommit check result type on GitHub, as a bug. > - Merged in the metrics design doc. > - Added detailed design section. > - Added list of deliverables. > > What I would like is consensus regarding: > - How fast we want precommit runs to be. I propose 30m. > - Deadline for fixing a slow test before it is temporarily removed from > precommit. I propose 24 hours. > > > Replying to the thread: > > 1. I like the idea of using the Jenkins Job Cacher Plugin to skip > unaffected tests (BEAM-4400). > > 2. Java Precommit tests include integration tests (example > <https://builds.apache.org/view/A-D/view/Beam/job/beam_PreCommit_Java_GradleBuild/lastCompletedBuild/testReport/org.apache.beam.examples/> > ). > We could split these out to get much faster results, i.e., a separate > precommit just for basic integration tests (which will still need to run in > <30m). > Perhaps lint checks for Python could be split out as well. > > I'll add these suggestions to the doc tomorrow. > > On Thu, May 24, 2018 at 9:25 AM Scott Wegner <sweg...@google.com> wrote: > >> So, it sounds like there's agreement that we should improve precommit >> times by only running necessary tests, and configuring Jenkins Job >> Caching + Gradle build cache is a path to get there. I've filed BEAM-4400 >> [1] to follow-up on this. >> >> Getting back to Udi's original proposal [2]: I see value in defining a >> metric and target for overall pre-commit timing. The proposal for an >> initial "2 hour" target is helpful as a guardrail: we're already hitting >> it, but if we drift to a point where we're not, that should trigger some >> action to be taken to get back to a healthy state. >> >> I wouldn't mind separately setting a more aspiration goal of getting the >> pre-commits even faster (i.e. 15-30 mins), but I suspect that would require >> a concerted effort to evaluate and improve existing tests across the >> codebase. One idea would be to set up ensure the metric reporting can show >> the trend, and which tests are responsible for the most walltime, so that >> we know where to invest any efforts to improve tests. >> >> >> [1] https://issues.apache.org/jira/browse/BEAM-4400 >> [2] >> https://docs.google.com/document/d/1udtvggmS2LTMmdwjEtZCcUQy6aQAiYTI3OrTP8CLfJM/edit?usp=sharing >> >> >> On Wed, May 23, 2018 at 11:46 AM Kenneth Knowles <k...@google.com> wrote: >> >>> With regard to the Job Cacher Plugin: I think it is an infra ticket to >>> install? And I guess we need it longer term when we move to containerized >>> builds anyhow? One thing I've experienced with the Travis-CI cache is that >>> the time spent uploading & downloading the remote cache - in that case of >>> all the pip installed dependencies - negated the benefits. Probably for >>> Beam it will have a greater benefit if we can skip most of the build. >>> >>> Regarding integration tests in precommit: I think it is OK to run maybe >>> one Dataflow job in precommit, but it should be in parallel with the unit >>> tests and just a smoke test that takes 5 minutes, not a suite that takes 35 >>> minutes. So IMO that is low-hanging fruit. If this would make postcommit >>> unstable, then it also means precommit is unstable. Both are troublesome. >>> >>> More short term, some possible hacks: >>> >>> - Point gradle to cache outside the git workspace. We already did this >>> for .m2 and it helped a lot. >>> - Intersect touched files with projects. Our nonstandard project names >>> might be a pain here. Not sure if fixing that is on the roadmap. >>> >>> Kenn >>> >>> On Wed, May 23, 2018 at 9:31 AM Ismaël Mejía <ieme...@gmail.com> wrote: >>> >>>> I second Robert idea of ‘inteligently’ running only the affected tests, >>>> probably >>>> there is no need to run Java for a go fix (and eventually if any issue >>>> it >>>> can be >>>> catched in postcommit), same for a dev who just fixed something in >>>> KafkaIO >>>> and has >>>> to wait for other IO tests to pass. I suppose that languages, IOs and >>>> extensions >>>> are ‘easy’ to isolate so maybe we can start with those. >>>> >>>> Earlier signals are also definitely great to have too, but not sure how >>>> we >>>> can >>>> have those with the current infra. >>>> >>>> From a quicklook the biggest time is consumed by the examples module >>>> probably >>>> because they run in Dataflow with real IOs no?, that module alone takes >>>> ~35 >>>> minutes, so maybe moving it to postcommit will gain us some quick >>>> improvement. >>>> On the other hand we should probably not dismiss the consequences of >>>> moving >>>> more >>>> stuff to postcommit given that our current postcommit is not the most >>>> stable, or >>>> the quickest, only the Dataflow suite takes 1h30! >>>> >>>> >>>> On Tue, May 22, 2018 at 12:01 AM Mikhail Gryzykhin <mig...@google.com> >>>> wrote: >>>> >>>> > What we can do here is estimate how much effort we want to put in and >>>> set >>>> remote target. >>>> > Such as: >>>> > Third quarter 2018 -- 1hr SLO >>>> > Forth quarter 2018 -- 30min SLO, >>>> > etc. >>>> >>>> > Combined with policy for newly added tests, this can give us some >>>> goal to >>>> aim for. >>>> >>>> > --Mikhail >>>> >>>> > Have feedback? >>>> >>>> >>>> > On Mon, May 21, 2018 at 2:06 PM Scott Wegner <sweg...@google.com> >>>> wrote: >>>> >>>> >> Thanks for the proposal, I left comments in the doc. Overall I think >>>> it's a great idea. >>>> >>>> >> I've seen other projects with much faster pre-commits, and it >>>> requires >>>> strict guidelines on unit test design and keeping tests isolated >>>> in-memory >>>> as much as possible. That's not currently the case in Java; we have >>>> pre-commits which submit pipelines to Dataflow service. >>>> >>>> >> I don't know if it's feasible to get Java down to 15-20 mins in the >>>> short term, but a good starting point would be to document the >>>> requirements >>>> for a test to run as pre-commit, and start enforcing it for new tests. >>>> >>>> >>>> >> On Fri, May 18, 2018 at 3:25 PM Henning Rohde <hero...@google.com> >>>> wrote: >>>> >>>> >>> Good proposal. I think it should be considered in tandem with the >>>> "No >>>> commit on red post-commit" proposal and could be far more ambitious >>>> than 2 >>>> hours. For example, something in the <15-20 mins range, say, would be >>>> much >>>> less of an inconvenience to the development effort. Go takes ~3 mins, >>>> which >>>> means that it is practical to wait until a PR is green before asking >>>> anyone >>>> to look at it. If I need to wait for a Java or Python pre-commit, I task >>>> switch and come back later. If the post-commits are enforced to be >>>> green, >>>> we could possibly gain a much more productive flow at the cost of the >>>> occasional post-commit break, compared to now. Maybe IOs can be less >>>> extensively tested pre-commit, for example, or only if actually changed? >>>> >>>> >>> I also like Robert's suggestion of spitting up pre-commits into >>>> something more fine-grained to get a clear partial signal quicker. If we >>>> have an adequate number of Jenkins slots, it might also speed things up >>>> overall. >>>> >>>> >>> Thanks, >>>> >>> Henning >>>> >>>> >>> On Fri, May 18, 2018 at 12:30 PM Scott Wegner <sweg...@google.com> >>>> wrote: >>>> >>>> >>>> re: intelligently skipping tests for code that doesn't change (i.e. >>>> Java tests on Python PR): this should be possible. We already have >>>> build-caching enabled in Gradle, but I believe it is local to the git >>>> workspace and doesn't persist between Jenkins runs. >>>> >>>> >>>> With a quick search, I see there is a Jenkins Build Cacher Plugin >>>> [1] >>>> that hooks into Gradle build cache and does exactly what we need. Does >>>> anybody know whether we could get this enabled on our Jenkins? >>>> >>>> >>>> [1] https://wiki.jenkins.io/display/JENKINS/Job+Cacher+Plugin >>>> >>>> >>>> On Fri, May 18, 2018 at 12:08 PM Robert Bradshaw < >>>> rober...@google.com> >>>> wrote: >>>> >>>> >>>>> [somehow my email got garbled...] >>>> >>>> >>>>> Now that we're using gradle, perhaps we could be more intelligent >>>> about only running the affected tests? E.g. when you touch Python (or >>>> Go) >>>> you shouldn't need to run the Java precommit at all, which would reduce >>>> the >>>> latency for those PRs and also the time spent in queue. Presumably this >>>> could even be applied per-module for the Java tests. (Maybe a large, >>>> shared >>>> build cache could help here as well...) >>>> >>>> >>>>> I also wouldn't be opposed to a quicker immediate signal, plus >>>> more >>>> extensive tests before actually merging. It's also nice to not have to >>>> wait >>>> an hour to see that you have a lint error; quick stuff like that could >>>> be >>>> signaled quickly before a contributor looses context. >>>> >>>> >>>>> - Robert >>>> >>>> >>>> >>>> >>>>> On Fri, May 18, 2018 at 5:55 AM Kenneth Knowles <k...@google.com> >>>> wrote: >>>> >>>> >>>>>> I like the idea. I think it is a good time for the project to >>>> start >>>> tracking this and keeping it usable. >>>> >>>> >>>>>> Certainly 2 hours is more than enough, is that not so? The Java >>>> precommit seems to take <=40 minutes while Python takes ~20 and Go is so >>>> fast it doesn't matter. Do we have enough stragglers that we don't make >>>> it >>>> in the 95th percentile? Is the time spent in the Jenkins queue? >>>> >>>> >>>>>> For our current coverage, I'd be willing to go for: >>>> >>>> >>>>>> - 1 hr hard cap (someone better at stats could choose %ile) >>>> >>>>>> - roll back or remove test from precommit if fix looks like >>>> more >>>> than 1 week (roll back if it is perf degradation, remove test from >>>> precommit if it is additional coverage that just doesn't fit in the >>>> time) >>>> >>>> >>>>>> There's a longer-term issue that doing a full build each time is >>>> expected to linearly scale up with the size of our repo (it is the >>>> monorepo >>>> problem but for a minirepo) so there is no cap that is feasible until we >>>> have effective cross-build caching. And my long-term goal would be <30 >>>> minutes. At the latency of opening a pull request and then checking your >>>> email that's not burdensome, but an hour is. >>>> >>>> >>>>>> Kenn >>>> >>>> >>>>>> On Thu, May 17, 2018 at 6:54 PM Udi Meiri <eh...@google.com> >>>> wrote: >>>> >>>> >>>>>>> HI, >>>> >>>>>>> I have a proposal to improve contributor experience by keeping >>>> precommit times low. >>>> >>>> >>>>>>> I'm looking to get community consensus and approval about: >>>> >>>>>>> 1. How long should precommits take. 2 hours @95th percentile >>>> over >>>> the past 4 weeks is the current proposal. >>>> >>>>>>> 2. The process for dealing with slowness. Do we: fix, roll back, >>>> remove a test from precommit? >>>> >>>>>>> Rolling back if a fix is estimated to take longer than 2 weeks >>>> is >>>> the current proposal. >>>> >>>> >>>> >>>> https://docs.google.com/document/d/1udtvggmS2LTMmdwjEtZCcUQy6aQAiYTI3OrTP8CLfJM/edit?usp=sharing >>>> >>>
smime.p7s
Description: S/MIME Cryptographic Signature