Re: [Essentials] Collecting usage telemetry from Jenkins Pipeline

Samuel Van Oort Mon, 19 Mar 2018 16:11:10 -0700

I'm glad my mega-reply did turn up after all (just belatedly), anyway, 
replies inline!


The existing Metrics plugin is using the Dropwizard interface, so:
 

> say, there isn't a world in which I wouldn't use statsd for this :) My 
> current 
> thinking is to incorporate the Metrics plugin 
> (https://plugins.jenkins.io/metrics) in order to provide the appropriate 
> interfaces, and if that's fine, then I would have no qualms with that 
> becoming 
> a dependency of Pipeline itself. I need to do some research on what 
> Dropwizard 
> baggage might be unnecessarily added into Jenkins. 


Basically means "just make it a normal Metric", or at least that's my 
intent.  I think it might make sense to make extra metrics like this not a 
*dependency* of Pipeline (since it would actually need to *depend* on parts 
of it), but an additional plugin in the Aggregator or part of some sort of 
Essentials plugin.   Which I probably explained incoherently due to have my 
head still buried in the gnarly guts of workflow-cps. 

As far as statsd goes: the Metrics interfaces are reporter-agnostic, so 
they don't care if you're fetching Metrics to Graphite, some sort of 
proprietary Analytics solution, StatsD, etc -- as long as you have a 
Reporter implementation.  The Metrics Graphite Plugin gives some idea how 
simple it can be to wrap an existing Reporter Impl with configuration for 
Jenkins: https://github.com/jenkinsci/metrics-graphite-plugin

*To Jesse:*

Actually you _can_ get this from the flow graph already. You just 
> count `StepNode`s with no `BodyInvocationAction`. (cf. 
> `StepStartNode.isBody`)


Could work, I'd have to take a closer look to confirm.  

Otherwise noted some good points and technical corrections (bear in mind I 
was trying to get a lot down quickly) about practical implementation... 
though I think Jesse might be the only one who calls the following "easy":


Easy—patch `CpsFlowExecution.start` to create a proxy `Invoker` that 
> counts the different kinds of calls. This could be included in the 
> current `CpsFlowExecution.PipelineTimings`, actually. 

Worth noting that we would need to make sure that such an implementation is 
*extremely* lightweight in practice because any overhead it adds to CPS 
operations would be felt.  As long as it's direct field access & 
incrementation (or AtomicXX access) that's probably fine.  

On Monday, March 19, 2018 at 6:32:01 PM UTC-4, R Tyler Croy wrote:
>
> (replies inline) 
>
> On Mon, 19 Mar 2018, Samuel Van Oort wrote: 
>
> > Late to the party because I was heads-down on Pipeline bugs a lot of 
> > Friday, but this is a subject near-and-dear to my heart and in the past 
> > I've discussed what metrics might be interesting since this was an 
> explicit 
> > intent to surface from my Bismuth (Pipeline Graph Analysis APIs).  Some 
> of 
> > these are things I'd wanted to make a weekend project of (including 
> > surfacing the existing workflow-cps performance metrics). 
>
>
> Long reply is long! Thanks for taking the time to respond Sam. Suffice it 
> to 
> say, there isn't a world in which I wouldn't use statsd for this :) My 
> current 
> thinking is to incorporate the Metrics plugin 
> (https://plugins.jenkins.io/metrics) in order to provide the appropriate 
> interfaces, and if that's fine, then I would have no qualms with that 
> becoming 
> a dependency of Pipeline itself. I need to do some research on what 
> Dropwizard 
> baggage might be unnecessarily added into Jenkins. 
>
>
> To many of your inline comments, I do not think there's any problem 
> collecting 
> as much telemetry as you and the other Pipeline developers see fit. My 
> list was 
> mostly what *I* think I need to demonstrate success with Pipeline for 
> Jenkins 
> Essentials, and to understand how Jenkins Essentials is being used in 
> order to 
> guide our future roadmap. 
>
>
>
>
> Cheers 
>
>
> > 
> > We should aim to implement Metrics using the existing Metrics interface 
> > because then that can be fairly easy exported in a variety of ways -- I 
> use 
> > a Graphite Metrics reporter that couples to another metric 
> aggregator/store 
> > for the Pipeline Scalability Lab (some may know it as "Hydra").  Other 
> > *cough* proprietary systems may already consume this format of data.  I 
> > would not be surprised if a StatsD reporter is pretty easy to hack 
> together 
> > using https://github.com/ReadyTalk/metrics-statsd and you get a lot of 
> > goodies "for free." 
> > 
> > The one catch for implementing metrics is that we want to be cautious 
> about 
> > adding too much overhead to the execution process. 
> > 
> > As far as specific metrics: 
> > 
> > > distinct built-in step invocations (i.e. not counting Global Variable 
> > invocations) 
> > 
> > This can't be measured easily from the flow graph due to the potential 
> to 
> > create multiple block structures for one step.  It COULD be added easily 
> > via a registered new StepListener API in workflow-api (and implemented 
> in 
> > workflow-cps) though.   I think it's valuable. 
> > 
> > > configured Declarative Pipelines, configured Script Pipelines 
> > 
> > We can get all Pipelines (flavor-agnostic) by iterating over WorkflowJob 
> > items.  Not sure how we'd tell Scripted vs. Declarative -- maybe 
> > registering a Listener extension point of some sort?   I see value here. 
> > 
> > I'd *also* like to have a breakdown of which Pipelines have been run in 
> the 
> > last, say week and month, by type (easy to do by looking at the most 
> recent 
> > build).   That way we know not just which were created but which are in 
> > active use. 
> > 
> > > Pipeline executions 
> > 
> > Rates and counts can be achieved with the existing Metrics Timer time. 
>  I'd 
> > like to see that broken down by Scripted vs. Declarative as well. 
> > 
> > > * Global Shared Pipelines configured 
> > >   * Folder-level Shared Pipelines configured 
> > 
> > Do you mean Shared Library use?  One metric I'd be interested in is how 
> > many shared libraries are used *per-pipeline* -- easy to measure from 
> the 
> > count of LoadedScripts I believe (correct me if there's something I'm 
> > missing here, Jesse). 
> > 
> > > Agents used per-Pipeline 
> > 
> > I think should be possible to do this easily via flow graph analysis, 
> > looking for WorkspaceActionImpl -- nodes and labels are be available. 
>  We 
> > might want to count total nodes *uses* (open/close of node blocks) and 
> > distinct nodes used.   
> > 
> > Best to triggers as a post-build analysis using the RunTrigger -- that 
> way 
> > it's just a quick iteration over the Pipeline. 
> > 
> > > Runtime duration per step invocation 
> > 
> > This is one of the MOST useful metrics I think. 
> > 
> > I already have an implementation used in the Scalability Lab that does 
> this 
> > on a per-flownode basis using the GraphListener (rather than per-step). 
>   
> > This is part of a small utility plugin for metrics used in the 
> scalability 
> > lab (not hosted currently since it's not general-use). 
> > 
> > Doing per-step is somewhat more complex - for many steps, trivial, but 
> for 
> > example for a Retry step there's not a logical way to do it because you 
> get 
> > multiple blocks.  Blocks in general are undefined - do you count the 
> block 
> > *contents*, just the start, just the end, or start+end nodes?   Also 
> > remember that Groovy logic counts against the Step time with the 
> > FlowNodes.  Usually that shouldn't be a huge issue unless the Groovy is 
> > complex.   
> > 
> > If that's too noisy there might be ways to insert Listeners for the Step 
> > itself (more complex though) -- I think using the FlowNodes is good 
> enough 
> > for now and gives us a solid first-order approximation that is useful 
> 99% 
> > of the time. 
> > 
> > I would also like to extend this by breaking it down into separate 
> metrics 
> > per step type, i.e. runtime for sh, runtime for echo, for 'node', etc.   
> >  This is easier than you'd think since you can fetch the StepDescriptor 
> and 
> > call getFunctionName to get a unique metric key for the step.   This is 
> far 
> > more useful to us than just average step timings, because it helps spot 
> > performance regressions in the field. 
> > 
> > Other aggregates of interest: total time spent in each step type for the 
> > pipeline and counts of the FlowNode by step per pipeline.  This will 
> show 
> > if we're spending (for example) a LOT of time running 
> > readFile/writeFile/dir steps due to some sudden bottleneck in the 
> remoting 
> > interaction and also reveal which step types are used most often.   
> Knowing 
> > which steps are used heavily helps me know which deserve extra priority 
> for 
> > bugfixes, features, and optimizations. 
> > 
> > It actually *sounds* far more complicated than it really is -- this 
> would 
> > be a pretty trivial afternoon project I think. 
> > 
> > > Runtime duration per Pipeline 
> > 
> > I already have an implementation.  Same plugin as above.  It's exposed 
> as a 
> > DropWizard histogram as well, so you get rates + aggregate times with 
> > median, mean, etc. 
> > 
> > *Other desired metrics:  *I think we want FlowNodes created as a rate 
> per 
> > unit time (I have an implementation in the same plugin above).   I also 
> > have an impl for this already (same plugins as before). 
> > 
> > If we could find a way I'd really like to have a counter of how many 
> > elements of GroovyCPS logic are run and how many function calls (for 
> > off-master you obviously wouldn't get this data).  This is something 
> useful 
> > for measuring the real complexity of their Groovy -- even better than 
> > Liam's Cyclomatic Complexity metric because it directly tracks runtime 
> > operations, not just code structure.   I have notions how we'd 
> accomplish 
> > this. 
> > 
> > On Friday, March 16, 2018 at 6:55:58 PM UTC-4, Andrew Bayer wrote: 
> > > 
> > > It???s a normal step - what I???m talking about is counting Pipelines 
> > > containing one or more script blocks, I.e., what percentage of total 
> > > Declarative Pipelines use script blocks, which I think is a more 
> useful 
> > > metric than just how many script block invocations there are. 
> > > 
> > > A. 
> > > 
> > > On Fri, Mar 16, 2018 at 5:32 PM R. Tyler Croy <[email protected] 
> > > <javascript:>> wrote: 
> > > 
> > >> (replies inline) 
> > >> 
> > >> On Fri, 16 Mar 2018, Andrew Bayer wrote: 
> > >> 
> > >> > If we???re going to be tracking step invocations anyway, it???d be 
> > >> interesting 
> > >> > to count the number of Declarative Pipelines with a script block, 
> maybe? 
> > >> 
> > >> I kind of assumed that if we were incrementing a counter on step 
> > >> invocations 
> > >> that script{} would be collected already by the machinery, e.g. isn't 
> it 
> > >> "just" 
> > >> a step? 
> > >> 
> > >> If it's a special snowflake then I'll make sure to include it in my 
> > >> design. 
> > >> 
> > >> 
> > >> A few more which come to mind now that I'm thinking about Script: 
> > >> 
> > >>  * Count of stages per Pipeline 
> > >>  * Count of Pipelines with the Groovy sandbox disable 
> > >>  * Time spent in script{} block 
> > >> 
> > >> 
> > >> Thanks for the ideas abayer! 
> > >> 
> > >> 
> > >> Cheers 
> > >> - R. Tyler Croy 
> > >> 
> > >> ------------------------------------------------------ 
> > >>      Code: <https://github.com/rtyler> 
> > >>   Chatter: <https://twitter.com/agentdero> 
> > >>      xmpp: [email protected] <javascript:> 
> > >> 
> > >>   % gpg --keyserver keys.gnupg.net --recv-key 1426C7DC3F51E16F 
> > >> ------------------------------------------------------ 
> > >> 
> > >> -- 
> > >> You received this message because you are subscribed to the Google 
> Groups 
> > >> "Jenkins Developers" group. 
> > >> To unsubscribe from this group and stop receiving emails from it, 
> send an 
> > >> email to [email protected] <javascript:>. 
> > >> To view this discussion on the web visit 
> > >> 
> https://groups.google.com/d/msgid/jenkinsci-dev/20180316213201.ckenekkqcbgtsuzx%40blackberry.coupleofllamas.com
>  
> > >> . 
> > >> For more options, visit https://groups.google.com/d/optout. 
> > >> 
> > > 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "Jenkins Developers" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to [email protected] <javascript:>. 
> > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/jenkinsci-dev/9842785f-49ee-43fc-ab61-d9e7b45dc3db%40googlegroups.com.
>  
>
> > For more options, visit https://groups.google.com/d/optout. 
>
>
> - R. Tyler Croy 
>
> ------------------------------------------------------ 
>      Code: <https://github.com/rtyler> 
>   Chatter: <https://twitter.com/agentdero> 
>      xmpp: [email protected] <javascript:> 
>
>   % gpg --keyserver keys.gnupg.net --recv-key 1426C7DC3F51E16F 
> ------------------------------------------------------ 
>

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-dev/3195d6f8-0387-4d01-b51e-f06cb56ca6a9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [Essentials] Collecting usage telemetry from Jenkins Pipeline

Reply via email to