Good practices to scale, IMO: * don't build on the master * Yes. Add agents/slaves, see below. * Never put more than one executor per agent (slave term now deprecated). Engineering time is far more expensive than having N agents, preventing builds to step over each other's toes. ** We initially used static agents, running in a corporate ESX. And now it's mostly running using a docker swarm cluster, for about 60 hours a day and ~1000 active jobs. * IMO directly go to two or three agents and not just one. This way you'll maybe avoid (your users) designing builds to depend on a specific machine. ** Corollary: never use node names as labels.
My 2 cents -- Baptiste Le 27 juil. 2016 8:42 PM, "Bruce Epstein" <[email protected]> a écrit : > Hi - > > I'm an experienced Jenkins user (writing Ant scripts, using plugins, etc.) > but not an IT/administrator, and my IT dept is not that familiar with > Jenkins scaling. > > If anyone can point me to a comprehensive discussion of the best way to > scale, please provide a url. > > Current architecture: > > Only one master with just a single executor. > All jobs are run on the master > Running jenkins 1.652 > The load is not the heavily. We probably never have more than 2 or 3 users > needing Jenkins at the same time, and usually it is just one. 95% of the > time, we don't have a scale issue, so I don't want to over-engineer the > solution. > We have three or four development teams, and sometimes queue conflicts > arise. We want to scale up a bit for future growth. > > Current problems: > 1. Some jobs (with three or four sub-jobs) monopolize the queue for 30+ > minutes, preventing other jobs from running. One in particular is a library > built in response to an svn change, which then triggers four other apps to > rebuild. These are separate Jenkins jobs and yet they hog the queue > preventing other users from running any jobs, even "in between" each app > being rebuilt. > > 2. Some multiconfiguration jobs (that build, say, 30 war files), can take > about 90 minutes to run (3 minutes per iteration). We'd like to cut that > down, but at least they allow other jobs to run (i.e. don't monopolize the > queue). These wars can be built in parallel (no need to run in series, > which is the default for multiconfiguration jobs, I assume). > > Things I've tried: > 1. No matter how I've tried to configure the queue-hogging job, I can't > get it to "play nicely". Once it starts, it runs all the way through (say, > 4 subjobs, each taking about 8 minutes). So, configuring the master to use, > say, 2 or 3 executors seems to be one way to allow other jobs to run > without being shut out. > > 2. Increasing the number of executors "works" for some use cases, but it > also seems to cause jobs to run in parallel that I need to run in sequence. > I'm unclear on how to prevent multiple executors from being used when I > want one job to wait for another. Is this just how executors work? How do I > ensure the extra executors are assigned to other jobs and not just used in > parallel for the queue-hogging job? > > Possible solutions: > 1. Add slaves? (see below) > 2. Use multiple executors with BuildFlow or similar plugins to prevent > jobs being triggered to run in parallel? Even BuildFlow seems to require at > least two executors, or it hangs up trying to launch the first subjob in > the flow. > > Proposed solution: > > 1. Stick with only one master. Creating multiple masters seems unnecessary > at our size. > 2. Don't build jobs on the master...leave that to the slaves. (This seems > to be the best practice?) > 3. Create two slaves eventually (one is enough for now while we are still > performing builds on master too) > 4. Configure one slave to use only one executor. Configure the second > slave to use multiple executors. > 5. Configure certain jobs to run on the appropriate slave (single-executor > or multi-executor) depending on the job's needs. > > 6. Should I be looking at CloudBees or plugins like EC2, Heavy Job, or > One-Shot Executor? > > > I need someone who has "been there, done that" to give me a reality check > or alert me to any blindspots before I ask IT to acquire more hardware and > configure it. I want to have some confidence this will solve the problem > without being overkill. > > > Any insights appreciated. > > > In gratitude, I'm happy to answer any Flex questions. :-) > > > Thanks, > > Bruce > > > -- > You received this message because you are subscribed to the Google Groups > "Jenkins Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/jenkinsci-users/3a038f29-5221-4830-80ae-ca5a70c7ccc7%40googlegroups.com > <https://groups.google.com/d/msgid/jenkinsci-users/3a038f29-5221-4830-80ae-ca5a70c7ccc7%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Jenkins Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/CANWgJS7ZULUVegsxiHKNohP3k7ZqoU8MMFA5F3TFHujN9pcZAg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
