Looks amazing! Just adding a screenshot so people will see how nice it is :) [1]
[1] http://graphite.phx.ovirt.org/dashboard/snapshot/qvdhPL34HoUpygGldklmT5uLUnRyj19E On Mon, Apr 18, 2016 at 10:36 AM, David Caro <[email protected]> wrote: > On 04/17 23:55, Nadav Goldin wrote: > > > > > > I think that will change a lot per-project basis, if we can get that > info > > > per > > > job, with grafana then we can aggregate and create secondary stats > (like > > > bilds > > > per hour as you say). > > > So I'd say just to collect the 'bare' data, like job built event, job > > > ended, > > > duration and such. > > > > agree. will need to improve that, right now it 'pulls' each X seconds via > > the CLI, > > instead of Jenkins sending the events, so it is limited to what the CLI > can > > provide and not that efficient. I plan to install [1] and do the opposite > > (Jenkins will send a POST request with the data on each build > > event and then it would be sent to graphite) > > Amarchuk had already some ideas on integrating collectd with jenkins, imo > that > will work well for 'master' related stats and more difficult for others > like > job started, etc. but worth looking at it > > > > > Have you checked the current ds fabric checks? > > > There are already a bunch of fabric tasks that monitor jenkins, if we > > > install > > > the nagiosgraph (see ds for details) to send the nagios performance > data > > > into > > > graphite, we can use them as is to also start alarms and such > > > > > Icinga2 has integrated graphite support, so after the upgrade we will > > get all of our alarms data sent to graphite 'out-of-the-box'. > > +1! > > > > > > > > > dcaro@akhos$ fab -l | grep nagi > > > do.jenkins.nagios.check_build_load Checks if > the > > > bui... > > > do.jenkins.nagios.check_executors Checks if > the > > > exe... > > > do.jenkins.nagios.check_queue Check if > the > > > buil... > > > do.provision.nagios_check Show a > summary > > > of... > > > > > > Though those will not give you the bare data (were designed with > nagios in > > > mind, not graphite so they are just checks, the stats were added later) > > > > > > There's also a bunch of helpers functions to create nagios checks too. > > > > > > > cool, wasn't aware of those fabric checks. > > I think for simple metrics(loads and such) we could use that(i.e. query > > Jenkins from fabric) > > but for more complicated queries we'd need to query graphite itself, > > with this[2] I could create scripts that query graphite and trigger > Icinga > > alerts. > > such as: calculate the 'expected' slaves load for the next hour(in > graphite) > > and then: > > Icinga queries graphite -> triggers another Icinga alert -> triggers > custom > > script(such as > > fab task to create slaves) > > I'd be careful with the reactions for now, but yes, that's great. > > > > > for now, added two more metrics: top 10 jobs in past X time, and > > avg number of builds running / builds waiting in queue in the past X > time. > > some metrics might 'glitch' from time to time as there is not a lot of > data > > yet > > and it mainly counts integer values while graphite is oriented towards > > floats, so the data has to be smoothed(usually with movingAverage()) > > > > > > > > [1] > > > https://wiki.jenkins-ci.org/display/JENKINS/Statistics+Notification+Plugin > > [2] https://github.com/klen/graphite-beacon > > > > On Fri, Apr 15, 2016 at 9:39 AM, David Caro <[email protected]> wrote: > > > > > On 04/15 01:24, Nadav Goldin wrote: > > > > Hi, > > > > I've created an experimental dashboard for Jenkins at our Grafana > > > instance: > > > > http://graphite.phx.ovirt.org/dashboard/db/jenkins-monitoring > > > > (if you don't have an account, you can enrol with github/google) > > > > > > Nice! \o/ > > > > > > > > > > > currently it collects the following metrics: > > > > 1) How many jobs in the Build Queue are waiting per slaves' label: > > > > > > > > for instance: if there are 4 builds of a job that is restricted to > 'el7' > > > > and 2 builds of another job > > > > which is restricted to 'el7' in the build queue we will see 6 for > 'el7' > > > in > > > > the first graph. > > > > 'No label' sums jobs which are waiting but are unrestricted. > > > > > > > > 2) How many slaves are idle per label. > > > > note that the slave's labels are contained in the job's labels, but > not > > > > vice versa, as > > > > we allow regex expressions such as (fc21 || fc22 ). right now it > treats > > > > them as simple > > > > strings. > > > > > > > > 3) Total number of online/offline/idle slaves > > > > > > > > besides the normal monitoring, it can help us: > > > > 1) minimize the difference between 'idle' slaves per label and jobs > > > waiting > > > > in the build queue per label. > > > > this might be caused by unnecessary restrictions on the label, or > maybe > > > by > > > > the > > > > 'Throttle Concurrent Builds' plugin. > > > > 2) decide how many VMs and which OS to install on the new hosts. > > > > 3) in the future, once we have the 'slave pools' implemented, we > could > > > > implement > > > > auto-scaling based on thresholds or some other function. > > > > > > > > > > > > 'experimental' - as it still needs to be tested for stability(it is > based > > > > on python-jenkins > > > > and graphite-send) and also more metrics can be added(maybe avg > running > > > time > > > > per job? builds per hour? ) - will be happy to hear. > > > > > > I think that will change a lot per-project basis, if we can get that > info > > > per > > > job, with grafana then we can aggregate and create secondary stats > (like > > > bilds > > > per hour as you say). > > > So I'd say just to collect the 'bare' data, like job built event, job > > > ended, > > > duration and such. > > > > > > > > > > > I plan later to pack it all into independent fabric tasks(i.e. fab > > > > do.jenkins.slaves.show) > > > > > > Have you checked the current ds fabric checks? > > > There are already a bunch of fabric tasks that monitor jenkins, if we > > > install > > > the nagiosgraph (see ds for details) to send the nagios performance > data > > > into > > > graphite, we can use them as is to also start alarms and such. > > > > > > dcaro@akhos$ fab -l | grep nagi > > > do.jenkins.nagios.check_build_load Checks if > the > > > bui... > > > do.jenkins.nagios.check_executors Checks if > the > > > exe... > > > do.jenkins.nagios.check_queue Check if > the > > > buil... > > > do.provision.nagios_check Show a > summary > > > of... > > > > > > Though those will not give you the bare data (were designed with > nagios in > > > mind, not graphite so they are just checks, the stats were added later) > > > > > > There's also a bunch of helpers functions to create nagios checks too. > > > > > > > > > > > > > > > > > > Nadav > > > > > > > _______________________________________________ > > > > Infra mailing list > > > > [email protected] > > > > http://lists.ovirt.org/mailman/listinfo/infra > > > > > > > > > -- > > > David Caro > > > > > > Red Hat S.L. > > > Continuous Integration Engineer - EMEA ENG Virtualization R&D > > > > > > Tel.: +420 532 294 605 > > > Email: [email protected] > > > IRC: dcaro|dcaroest@{freenode|oftc|redhat} > > > Web: www.redhat.com > > > RHT Global #: 82-62605 > > > > > > _______________________________________________ > > Infra mailing list > > [email protected] > > http://lists.ovirt.org/mailman/listinfo/infra > > > -- > David Caro > > Red Hat S.L. > Continuous Integration Engineer - EMEA ENG Virtualization R&D > > Tel.: +420 532 294 605 > Email: [email protected] > IRC: dcaro|dcaroest@{freenode|oftc|redhat} > Web: www.redhat.com > RHT Global #: 82-62605 > > _______________________________________________ > Infra mailing list > [email protected] > http://lists.ovirt.org/mailman/listinfo/infra > > -- Eyal Edri Associate Manager RHEV DevOps EMEA ENG Virtualization R&D Red Hat Israel phone: +972-9-7692018 irc: eedri (on #tlv #rhev-dev #rhev-integ)
_______________________________________________ Infra mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/infra
