> > I think that will change a lot per-project basis, if we can get that info > per > job, with grafana then we can aggregate and create secondary stats (like > bilds > per hour as you say). > So I'd say just to collect the 'bare' data, like job built event, job > ended, > duration and such.
agree. will need to improve that, right now it 'pulls' each X seconds via the CLI, instead of Jenkins sending the events, so it is limited to what the CLI can provide and not that efficient. I plan to install [1] and do the opposite (Jenkins will send a POST request with the data on each build event and then it would be sent to graphite) Have you checked the current ds fabric checks? > There are already a bunch of fabric tasks that monitor jenkins, if we > install > the nagiosgraph (see ds for details) to send the nagios performance data > into > graphite, we can use them as is to also start alarms and such > Icinga2 has integrated graphite support, so after the upgrade we will get all of our alarms data sent to graphite 'out-of-the-box'. > > dcaro@akhos$ fab -l | grep nagi > do.jenkins.nagios.check_build_load Checks if the > bui... > do.jenkins.nagios.check_executors Checks if the > exe... > do.jenkins.nagios.check_queue Check if the > buil... > do.provision.nagios_check Show a summary > of... > > Though those will not give you the bare data (were designed with nagios in > mind, not graphite so they are just checks, the stats were added later) > > There's also a bunch of helpers functions to create nagios checks too. > cool, wasn't aware of those fabric checks. I think for simple metrics(loads and such) we could use that(i.e. query Jenkins from fabric) but for more complicated queries we'd need to query graphite itself, with this[2] I could create scripts that query graphite and trigger Icinga alerts. such as: calculate the 'expected' slaves load for the next hour(in graphite) and then: Icinga queries graphite -> triggers another Icinga alert -> triggers custom script(such as fab task to create slaves) for now, added two more metrics: top 10 jobs in past X time, and avg number of builds running / builds waiting in queue in the past X time. some metrics might 'glitch' from time to time as there is not a lot of data yet and it mainly counts integer values while graphite is oriented towards floats, so the data has to be smoothed(usually with movingAverage()) [1] https://wiki.jenkins-ci.org/display/JENKINS/Statistics+Notification+Plugin [2] https://github.com/klen/graphite-beacon On Fri, Apr 15, 2016 at 9:39 AM, David Caro <[email protected]> wrote: > On 04/15 01:24, Nadav Goldin wrote: > > Hi, > > I've created an experimental dashboard for Jenkins at our Grafana > instance: > > http://graphite.phx.ovirt.org/dashboard/db/jenkins-monitoring > > (if you don't have an account, you can enrol with github/google) > > Nice! \o/ > > > > > currently it collects the following metrics: > > 1) How many jobs in the Build Queue are waiting per slaves' label: > > > > for instance: if there are 4 builds of a job that is restricted to 'el7' > > and 2 builds of another job > > which is restricted to 'el7' in the build queue we will see 6 for 'el7' > in > > the first graph. > > 'No label' sums jobs which are waiting but are unrestricted. > > > > 2) How many slaves are idle per label. > > note that the slave's labels are contained in the job's labels, but not > > vice versa, as > > we allow regex expressions such as (fc21 || fc22 ). right now it treats > > them as simple > > strings. > > > > 3) Total number of online/offline/idle slaves > > > > besides the normal monitoring, it can help us: > > 1) minimize the difference between 'idle' slaves per label and jobs > waiting > > in the build queue per label. > > this might be caused by unnecessary restrictions on the label, or maybe > by > > the > > 'Throttle Concurrent Builds' plugin. > > 2) decide how many VMs and which OS to install on the new hosts. > > 3) in the future, once we have the 'slave pools' implemented, we could > > implement > > auto-scaling based on thresholds or some other function. > > > > > > 'experimental' - as it still needs to be tested for stability(it is based > > on python-jenkins > > and graphite-send) and also more metrics can be added(maybe avg running > time > > per job? builds per hour? ) - will be happy to hear. > > I think that will change a lot per-project basis, if we can get that info > per > job, with grafana then we can aggregate and create secondary stats (like > bilds > per hour as you say). > So I'd say just to collect the 'bare' data, like job built event, job > ended, > duration and such. > > > > > I plan later to pack it all into independent fabric tasks(i.e. fab > > do.jenkins.slaves.show) > > Have you checked the current ds fabric checks? > There are already a bunch of fabric tasks that monitor jenkins, if we > install > the nagiosgraph (see ds for details) to send the nagios performance data > into > graphite, we can use them as is to also start alarms and such. > > dcaro@akhos$ fab -l | grep nagi > do.jenkins.nagios.check_build_load Checks if the > bui... > do.jenkins.nagios.check_executors Checks if the > exe... > do.jenkins.nagios.check_queue Check if the > buil... > do.provision.nagios_check Show a summary > of... > > Though those will not give you the bare data (were designed with nagios in > mind, not graphite so they are just checks, the stats were added later) > > There's also a bunch of helpers functions to create nagios checks too. > > > > > > > > Nadav > > > _______________________________________________ > > Infra mailing list > > [email protected] > > http://lists.ovirt.org/mailman/listinfo/infra > > > -- > David Caro > > Red Hat S.L. > Continuous Integration Engineer - EMEA ENG Virtualization R&D > > Tel.: +420 532 294 605 > Email: [email protected] > IRC: dcaro|dcaroest@{freenode|oftc|redhat} > Web: www.redhat.com > RHT Global #: 82-62605 >
_______________________________________________ Infra mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/infra
