Thanks for reaching out, and sorry for missing you earlier today.

2016-03-08 10:37 GMT-08:00 Payal Priyadarshini <[email protected]>:

> Hey,
>
>
> This is Payal Priyadarshini, a final year undergraduate student enrolled
> in its dual degree (B.Tech+M.Tech) programme of CSE department of Indian
> Institute of Technology Kharagpur. I have gone through the idea list of the
> project offered, and i am interested in Usage Statistics Analysis Project.
>
> I have experience in Data Mining field, where I worked with the Meetup [online
> social networking portal that facilitates offline group meetings] data to
> find out the success of events, group popularities etc.
>

Great, can you tell us more about that project? We also use meetup.com and
maybe we can form some project idea around that.


>
> Regarding this project, I am going through few links suggested by Daniel.
>
>
> Link1 : http://stats.jenkins-ci.org/
>
>
> - Jenkins Statistics [
> http://stats.jenkins-ci.org/jenkins-stats/svg/svgs.html]
>
>
>    - The way charts are generated can be improved, like instead of using
>    linear axis, we can use logarithmic axis so that it can cover larger range,
>    plot the size distributions etc. Although I didn't get what does "nodes"
>    represent.
>
>
Think of a node as a member of a cluster. It's tracking the combined total
size of the entire Jenkins clusters across all the installations.

The use cases for the stats graphs we have today pretty much come down to:

   - "I want to feel good looking at a graph that's growing up, up and up"
   - "I want to see the latest Jenkins installation counts / node counts /
   job counts /... How do I do that?"
   - "I want to put this chart in my slide" (so I'd rather want a CSV file
   because I know how to plot a graph in Excel and make it look the way I want)

The first use case is the one that we handle very well today :-), but other
use cases, not so much.


>    - The new UI can be developed which will be easy to understand, to
>    view the statistics of the usage. For example to show the popularity of the
>    plugins in a given time period, we can only show top 20 plugins, to
>    increase the visibility in piechart or bar graph.
>
> If you are interested in looking at the data in other ways, which is
great, then we should think first about what kind of questions we want
stats to answer. I'm sure many people have tons of questions, some of mine
are:

   - How many of our users are running Windows, and how many are Linux? Of
   Linux, what are the percentage of Debian family vs RHEL family?
   - What's the distribution of cluster size? Did it change over time? Does
   the age of the installation corelate to the cluster size? If so what does
   that curve look like?
   - How quickly/often are people upgrading? Is there a popular release and
   unpopular release? Can we spot downgrades? Do they correlate to the
   perceived quality of the releases (see community rating in here
   <https://jenkins-ci.org/changelog/>)? Can we use it to warn us if a
   release seems to be unpopular?

These are all harder questions to answer than what the stats page show
today, but I think those are more technically interesting for you, and in
the scope of GSoC I think it's quite adequate.

If you haven't been a Jenkins user and find it difficult to get your head
around what kind of questions we want statistics to answer, maybe the way
to go is to go one level meta and find a way to make this data available
for adhoc queries, so that people with interesting questions can query
these data by some generic language, ala Apache Pig.



> We can maintain duration window(can be adjusted by the user) and then show
> stats in that time frame rather than showing the monthly data. Similar idea
> related to processing is already there on wiki page of the project. I will
> elaborate all these points in detail in my proposal.
>

This would be a great help.

- Jenkins plugin dependency graph [link
> <http://stats.jenkins-ci.org/jenkins-stats/jenkinsgraph.html?filter=kohsuke>
> ]
>
>    - I think dependency graph can be exploited to tell what plugins are
>    more likely used together. Can someone please clarify that what exactly
>    dependency denotes here ? And, where can I find the source code for this.
>
> Here <https://github.com/jenkinsci/infra-statistics>


> Census data [link <https://jenkins-ci.org/census/>]
>
>    - What is the metadata/fields for these json files?
>
> Yeah we should document this. I or Daniel will get back to you on that
one. You'll also want to know the sense of the data set size.



> Repo for the current sources [link
> <https://github.com/jenkinsci/infra-statistics>].
>
>    - Which languages other than groovy can be used by candidates for this
>    project?
>
> Java or Groovy would be preferred. That way, we have more people who can
work after GSoC ends.


> So, after going through all these links, what should my next step to make
> contribution to this project ?
>

I believe we need to drive toward your creating a project plan that you'll
then submit to GSoC. It sounds to me like you still need to get yourself
oriented in what exists, and probably learn a bit of about Jenkins --- what
it does, who uses it, that sort of things. I think that'll help you think
about what are the interesting questions we are trying to answer by using
data mining. If you want to hear more brainstorming from me or others, we
are happy to provide one.

In parallel, we'd like to hear from you some specific space you want to
take on --- "usage stat analysis" is still too big and vague.

There's upcoming student office hours that you might be interested, too.

Looking forward to suggestions.
>
> Thanks a lot.
>
> Regards,
> Payal Priyadarshini
>
> --
> You received this message because you are subscribed to the Google Groups
> "Jenkins Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/jenkinsci-dev/5550ea02-93b7-45ca-a992-2149311ec881%40googlegroups.com
> <https://groups.google.com/d/msgid/jenkinsci-dev/5550ea02-93b7-45ca-a992-2149311ec881%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Kohsuke Kawaguchi

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-dev/CAN4CQ4zE4az%2BA1e3ojHznJ1ir3FYC9p5RhoF%3Dig6eO4xK3bi0Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to