That's looking pretty nice! - fabian
On Wed, Apr 20, 2016 at 1:48 PM, Eyal Edri <[email protected]> wrote: > Looks amazing! > > Just adding a screenshot so people will see how nice it is :) [1] > > [1] > http://graphite.phx.ovirt.org/dashboard/snapshot/qvdhPL34HoUpygGldklmT5uLUnRyj19E > > > On Mon, Apr 18, 2016 at 10:36 AM, David Caro <[email protected]> wrote: > >> On 04/17 23:55, Nadav Goldin wrote: >> > > >> > > I think that will change a lot per-project basis, if we can get that >> info >> > > per >> > > job, with grafana then we can aggregate and create secondary stats >> (like >> > > bilds >> > > per hour as you say). >> > > So I'd say just to collect the 'bare' data, like job built event, job >> > > ended, >> > > duration and such. >> > >> > agree. will need to improve that, right now it 'pulls' each X seconds >> via >> > the CLI, >> > instead of Jenkins sending the events, so it is limited to what the CLI >> can >> > provide and not that efficient. I plan to install [1] and do the >> opposite >> > (Jenkins will send a POST request with the data on each build >> > event and then it would be sent to graphite) >> >> Amarchuk had already some ideas on integrating collectd with jenkins, imo >> that >> will work well for 'master' related stats and more difficult for others >> like >> job started, etc. but worth looking at it >> >> > >> > Have you checked the current ds fabric checks? >> > > There are already a bunch of fabric tasks that monitor jenkins, if we >> > > install >> > > the nagiosgraph (see ds for details) to send the nagios performance >> data >> > > into >> > > graphite, we can use them as is to also start alarms and such >> > > >> > Icinga2 has integrated graphite support, so after the upgrade we will >> > get all of our alarms data sent to graphite 'out-of-the-box'. >> >> +1! >> >> > >> > > >> > > dcaro@akhos$ fab -l | grep nagi >> > > do.jenkins.nagios.check_build_load Checks if >> the >> > > bui... >> > > do.jenkins.nagios.check_executors Checks if >> the >> > > exe... >> > > do.jenkins.nagios.check_queue Check if >> the >> > > buil... >> > > do.provision.nagios_check Show a >> summary >> > > of... >> > > >> > > Though those will not give you the bare data (were designed with >> nagios in >> > > mind, not graphite so they are just checks, the stats were added >> later) >> > > >> > > There's also a bunch of helpers functions to create nagios checks too. >> > > >> > >> > cool, wasn't aware of those fabric checks. >> > I think for simple metrics(loads and such) we could use that(i.e. query >> > Jenkins from fabric) >> > but for more complicated queries we'd need to query graphite itself, >> > with this[2] I could create scripts that query graphite and trigger >> Icinga >> > alerts. >> > such as: calculate the 'expected' slaves load for the next hour(in >> graphite) >> > and then: >> > Icinga queries graphite -> triggers another Icinga alert -> triggers >> custom >> > script(such as >> > fab task to create slaves) >> >> I'd be careful with the reactions for now, but yes, that's great. >> >> > >> > for now, added two more metrics: top 10 jobs in past X time, and >> > avg number of builds running / builds waiting in queue in the past X >> time. >> > some metrics might 'glitch' from time to time as there is not a lot of >> data >> > yet >> > and it mainly counts integer values while graphite is oriented towards >> > floats, so the data has to be smoothed(usually with movingAverage()) >> > >> > >> > >> > [1] >> > >> https://wiki.jenkins-ci.org/display/JENKINS/Statistics+Notification+Plugin >> > [2] https://github.com/klen/graphite-beacon >> > >> > On Fri, Apr 15, 2016 at 9:39 AM, David Caro <[email protected]> wrote: >> > >> > > On 04/15 01:24, Nadav Goldin wrote: >> > > > Hi, >> > > > I've created an experimental dashboard for Jenkins at our Grafana >> > > instance: >> > > > http://graphite.phx.ovirt.org/dashboard/db/jenkins-monitoring >> > > > (if you don't have an account, you can enrol with github/google) >> > > >> > > Nice! \o/ >> > > >> > > > >> > > > currently it collects the following metrics: >> > > > 1) How many jobs in the Build Queue are waiting per slaves' label: >> > > > >> > > > for instance: if there are 4 builds of a job that is restricted to >> 'el7' >> > > > and 2 builds of another job >> > > > which is restricted to 'el7' in the build queue we will see 6 for >> 'el7' >> > > in >> > > > the first graph. >> > > > 'No label' sums jobs which are waiting but are unrestricted. >> > > > >> > > > 2) How many slaves are idle per label. >> > > > note that the slave's labels are contained in the job's labels, but >> not >> > > > vice versa, as >> > > > we allow regex expressions such as (fc21 || fc22 ). right now it >> treats >> > > > them as simple >> > > > strings. >> > > > >> > > > 3) Total number of online/offline/idle slaves >> > > > >> > > > besides the normal monitoring, it can help us: >> > > > 1) minimize the difference between 'idle' slaves per label and jobs >> > > waiting >> > > > in the build queue per label. >> > > > this might be caused by unnecessary restrictions on the label, or >> maybe >> > > by >> > > > the >> > > > 'Throttle Concurrent Builds' plugin. >> > > > 2) decide how many VMs and which OS to install on the new hosts. >> > > > 3) in the future, once we have the 'slave pools' implemented, we >> could >> > > > implement >> > > > auto-scaling based on thresholds or some other function. >> > > > >> > > > >> > > > 'experimental' - as it still needs to be tested for stability(it is >> based >> > > > on python-jenkins >> > > > and graphite-send) and also more metrics can be added(maybe avg >> running >> > > time >> > > > per job? builds per hour? ) - will be happy to hear. >> > > >> > > I think that will change a lot per-project basis, if we can get that >> info >> > > per >> > > job, with grafana then we can aggregate and create secondary stats >> (like >> > > bilds >> > > per hour as you say). >> > > So I'd say just to collect the 'bare' data, like job built event, job >> > > ended, >> > > duration and such. >> > > >> > > > >> > > > I plan later to pack it all into independent fabric tasks(i.e. fab >> > > > do.jenkins.slaves.show) >> > > >> > > Have you checked the current ds fabric checks? >> > > There are already a bunch of fabric tasks that monitor jenkins, if we >> > > install >> > > the nagiosgraph (see ds for details) to send the nagios performance >> data >> > > into >> > > graphite, we can use them as is to also start alarms and such. >> > > >> > > dcaro@akhos$ fab -l | grep nagi >> > > do.jenkins.nagios.check_build_load Checks if >> the >> > > bui... >> > > do.jenkins.nagios.check_executors Checks if >> the >> > > exe... >> > > do.jenkins.nagios.check_queue Check if >> the >> > > buil... >> > > do.provision.nagios_check Show a >> summary >> > > of... >> > > >> > > Though those will not give you the bare data (were designed with >> nagios in >> > > mind, not graphite so they are just checks, the stats were added >> later) >> > > >> > > There's also a bunch of helpers functions to create nagios checks too. >> > > >> > > >> > > > >> > > > >> > > > Nadav >> > > >> > > > _______________________________________________ >> > > > Infra mailing list >> > > > [email protected] >> > > > http://lists.ovirt.org/mailman/listinfo/infra >> > > >> > > >> > > -- >> > > David Caro >> > > >> > > Red Hat S.L. >> > > Continuous Integration Engineer - EMEA ENG Virtualization R&D >> > > >> > > Tel.: +420 532 294 605 >> > > Email: [email protected] >> > > IRC: dcaro|dcaroest@{freenode|oftc|redhat} >> > > Web: www.redhat.com >> > > RHT Global #: 82-62605 >> > > >> >> > _______________________________________________ >> > Infra mailing list >> > [email protected] >> > http://lists.ovirt.org/mailman/listinfo/infra >> >> >> -- >> David Caro >> >> Red Hat S.L. >> Continuous Integration Engineer - EMEA ENG Virtualization R&D >> >> Tel.: +420 532 294 605 >> Email: [email protected] >> IRC: dcaro|dcaroest@{freenode|oftc|redhat} >> Web: www.redhat.com >> RHT Global #: 82-62605 >> >> _______________________________________________ >> Infra mailing list >> [email protected] >> http://lists.ovirt.org/mailman/listinfo/infra >> >> > > > -- > Eyal Edri > Associate Manager > RHEV DevOps > EMEA ENG Virtualization R&D > Red Hat Israel > > phone: +972-9-7692018 > irc: eedri (on #tlv #rhev-dev #rhev-integ) > > _______________________________________________ > Infra mailing list > [email protected] > http://lists.ovirt.org/mailman/listinfo/infra > > -- Fabian Deutsch <[email protected]> RHEV Hypervisor Red Hat
_______________________________________________ Infra mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/infra
