On 17 July 2015 at 13:08, Dimiter Naydenov <[email protected]> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 17.07.2015 12:07, James Tunnicliffe wrote: >> /me opens can of worms > Thanks for starting the discussion :) > >> >> Having spent perhaps too long trying to parallelise the running of >> the unit test suite over multiple machines using various bat guano >> crazy ideas, I know too much about this but haven't got an easy >> fix. I do know the right fix is to re-write the very long tests >> that we have. >> >> If you want to find long running tests, go test -v ./... -check.v >> is the command to run at top level. You will get a lot of output, >> but it isn't difficult to find tests that take longer than 10 >> seconds with grep and I am sure I could dig the script out that I >> wrote that examines the output and tells you all tests over a >> certain runtime. >> >> When you run "go test ./..." at the top of juju/juju it runs suites >> in parallel. If you have multiple long tests in a suite then it has >> a significant impact on the total runtime. We have no way with the >> current tools to exclude single tests without modifying the tests >> themselves; > > How about GOMAXPROCS=1 go test ./... ? Won't that force the runtime to > run all suites sequentially?
I don't want to run them sequentially - that would be slower. There are several things going on. First, long tests are bad, but if they have to be long then starting them as soon as possible is good because it is more efficient to pack big things first, then small things (think of a bucket, put the big rocks in first, sand in last, you can easily level the sand off, but if you put the sand in first you end up with a lumpy surface). The second is long tests tend to be ones sitting and waiting for things to happen, but aren't very CPU intensive, but if you increase GOMAXPROCS in the hope that you can take advantage of unused CPU time you mostly end up making other tests fail that are timing dependant because you just slowed them down enough to fail. The third is the scheduler running tests seems to (though I haven't looked at the code) run one suite per process and those suites single threaded, in alphabetical directory search order, so since our longer suites tend to be closer to the end of that list than to start with, it doesn't optimally schedule. I know there is work ongoing to improve the go scheduler, which may help if it looks at load and not just number of active processes. >> if we did we could run all the tests that take less than a few >> seconds by maintaining a list of long tests, and run those long >> tests as a separate, parallel task. The real fix is to put some >> effort into making the long running tests more unit test and less >> full stack test. 30+ seconds is not what we want. The least worst >> idea I have is making a sub-suite for tests that take > 10 seconds, >> one test per suite, so the standard tools will run them in parallel >> with everything else. Providing you have many CPUs there is a >> reasonable chance this will help. It is not remotely nice though. > > Using go tool pprof can also help figuring out why certain tests take > a long time and/or memory. I'm planning to experiment with it and come > up with some feedback. I did take a quick look a while ago, but I was a young Juju hacker and young go hacker, so didn't get much further than looking at the numbers and thinking "yep, they are big". I would be very surprised if there was an easy fix for the long running tests. I expect that testing in a different way is required. The good news is the number of long tests is small. These are the long tests as found by the combination of these two: http://pastebin.ubuntu.com/11892666/ http://pastebin.ubuntu.com/11892667/ PASS: pinger_test.go:131: mongoPingerSuite.TestAgentConnectionsShutDownWhenStateDies 30.368s PASS: fetch_test.go:60: FetchSuite.TestRun 9.003s PASS: fetch_test.go:60: FetchSuite.TestRun 9.002s PASS: status_test.go:2673: StatusSuite.TestStatusAllFormats 13.327s PASS: upgradejuju_test.go:308: UpgradeJujuSuite.TestUpgradeJuju 16.219s PASS: machine_test.go:409: MachineSuite.TestHostUnits 10.795s PASS: machine_test.go:498: MachineSuite.TestManageEnviron 9.919s PASS: machine_test.go:1941: mongoSuite.TestStateWorkerDialSetsWriteMajority 12.071s PASS: unit_test.go:225: UnitSuite.TestUpgradeFailsWithoutTools 10.116s PASS: bootstrap_test.go:142: bootstrapSuite.TestBootstrapNoToolsDevelopmentConfig 11.892s PASS: bootstrap_test.go:123: bootstrapSuite.TestBootstrapNoToolsNonReleaseStream 11.623s PASS: leadership_test.go:130: leadershipSuite.TestClaimLeadership 10.021s PASS: dblog_test.go:65: dblogSuite.TestMachineAgentWithoutFeatureFlag 10.012s PASS: dblog_test.go:83: dblogSuite.TestUnitAgentWithoutFeatureFlag 10.060s PASS: oplog_test.go:26: oplogSuite.TestWithRealOplog 14.208s PASS: assign_test.go:1259: assignCleanSuite.TestAssignUnitPolicyConcurrently 10.530s PASS: assign_test.go:1259: assignCleanSuite.TestAssignUnitPolicyConcurrently 10.834s PASS: state_test.go:189: MultiEnvStateSuite.TestWatchTwoEnvironments 9.766s PASS: restore_test.go:98: RestoreSuite.TestReplicasetIsReset 11.175s PASS: initiate_test.go:24: InitiateSuite.TestInitiateReplicaSet 10.075s PASS: kvm-broker_test.go:403: kvmProvisionerSuite.TestContainerStartedAndStopped 10.056s PASS: lxc-broker_test.go:1087: lxcProvisionerSuite.TestContainerStartedAndStopped 15.054s PASS: provisioner_test.go:1095: ProvisionerSuite.TestSetInstanceInfoFailureSetsErrorStatusAndStopsInstanceButKeepsGoing 10.144s PASS: uniter_test.go:1508: UniterSuite.TestActionEvents 39.614s PASS: uniter_test.go:1114: UniterSuite.TestUniterRelations 16.092s PASS: uniter_test.go:970: UniterSuite.TestUniterUpgradeGitConflicts 10.982s {'quick': 71, 'long': 19, 'ok': 51, 'sub-second': 6970, 'very-long': 26} >> >> 0 ✓ dooferlad@homework2 >> ~/dev/go/src/github.com/juju/juju/worker/uniter $ go test -check.v >> >> Shorter tests deleted from this list. The longest are: PASS: >> uniter_test.go:1508: UniterSuite.TestActionEvents 39.711s PASS: >> uniter_test.go:1114: UniterSuite.TestUniterRelations 16.276s PASS: >> uniter_test.go:970: UniterSuite.TestUniterUpgradeGitConflicts >> 11.354s >> >> These are a worth a look: PASS: uniter_test.go:2053: >> UniterSuite.TestLeadership 5.146s PASS: util_unix_test.go:103: >> UniterSuite.TestRunCommand 6.946s PASS: uniter_test.go:2104: >> UniterSuite.TestStorage 4.593s PASS: uniter_test.go:1367: >> UniterSuite.TestUniterCollectMetrics 4.102s PASS: >> uniter_test.go:774: UniterSuite.TestUniterDeployerConversion >> 6.904s PASS: uniter_test.go:427: >> UniterSuite.TestUniterDyingReaction 5.772s PASS: >> uniter_test.go:393: UniterSuite.TestUniterHookSynchronisation >> 4.546s PASS: uniter_test.go:1274: >> UniterSuite.TestUniterRelationErrors 4.536s PASS: >> uniter_test.go:476: UniterSuite.TestUniterSteadyStateUpgrade >> 6.405s PASS: uniter_test.go:895: >> UniterSuite.TestUniterUpgradeConflicts 6.430s >> >> ok github.com/juju/juju/worker/uniter 175.014s >> >> James >> >> On 17 July 2015 at 04:59, Tim Penhey <[email protected]> >> wrote: >>> Hi Curtis, >>> >>> I have been looking at some of the recent cursings from ppc64le, >>> and the last two included timeouts for the worker/uniter tests. >>> >>> On my machine, amd64, i7, 16 gig ram, I get the following: >>> >>> $ time go test 2015-07-17 03:53:03 WARNING juju.worker.uniter >>> upgrade123.go:26 no uniter state file found for unit >>> unit-mysql-0, skipping uniter upgrade step OK: 51 passed PASS ok >>> github.com/juju/juju/worker/uniter 433.256s >>> >>> real 7m24.270s user 3m18.647s sys 1m2.472s >>> >>> Now lets ignore the the logging output that someone should fix, >>> we can see how long it takes here. Given that gccgo on power is >>> slower, we are going to do two things: >>> >>> 1) increase the timeouts for the uniter >>> >>> 2) change the uniter tests >>> >>> WRT to point 2, most of the uniter tests are actually fully >>> functional end to end tests, and should not be run every time we >>> land code. >>> >>> They should be moved into the featuretest package. >>> >>> Thanks, Tim >>> >>> -- Juju-dev mailing list [email protected] Modify >>> settings or unsubscribe at: >>> https://lists.ubuntu.com/mailman/listinfo/juju-dev >> > > > - -- > Dimiter Naydenov <[email protected]> > Juju Core Sapphire team <http://juju.ubuntu.com> > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v2.0.22 (GNU/Linux) > > iQEcBAEBAgAGBQJVqPAiAAoJENzxV2TbLzHwIHYIAKLXI2F4V/Jp+3rFLqbOCrgx > QHTnnnARC7yDE5nbz0nFC/Z6JdEIsG+Xc+JzsaYh+cpZiRTmRvwztSlOyFBq649a > fpCyUttY7CvPGxf+ul58dkFD2JL7Pv/ZNOAR4vGS6X2IR5y/UohtJVntkh3i68xQ > +zRNlhmrGs2pxYVTHMPjfO+X83Cv/UNHq/j7upk1jRKXrm4AjjqGS+vQkIvTUJDF > Y2T8efxFXHnMP5u3qI6yyoE1C8/wjh2AHkNNcVPoAy8ClRVjowOo0UpSH8XV2k89 > PRtA35ON7Xrgrv45SOehuDo7PyeZacop7wp2d+tNKLLV4xi75aKkt7EQUcfmNOk= > =I+Ar > -----END PGP SIGNATURE----- > > -- > Juju-dev mailing list > [email protected] > Modify settings or unsubscribe at: > https://lists.ubuntu.com/mailman/listinfo/juju-dev -- Juju-dev mailing list [email protected] Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
