Thanks for reaching out, and sorry for missing you earlier today. 2016-03-08 10:37 GMT-08:00 Payal Priyadarshini <[email protected]>:
> Hey, > > > This is Payal Priyadarshini, a final year undergraduate student enrolled > in its dual degree (B.Tech+M.Tech) programme of CSE department of Indian > Institute of Technology Kharagpur. I have gone through the idea list of the > project offered, and i am interested in Usage Statistics Analysis Project. > > I have experience in Data Mining field, where I worked with the Meetup [online > social networking portal that facilitates offline group meetings] data to > find out the success of events, group popularities etc. > Great, can you tell us more about that project? We also use meetup.com and maybe we can form some project idea around that. > > Regarding this project, I am going through few links suggested by Daniel. > > > Link1 : http://stats.jenkins-ci.org/ > > > - Jenkins Statistics [ > http://stats.jenkins-ci.org/jenkins-stats/svg/svgs.html] > > > - The way charts are generated can be improved, like instead of using > linear axis, we can use logarithmic axis so that it can cover larger range, > plot the size distributions etc. Although I didn't get what does "nodes" > represent. > > Think of a node as a member of a cluster. It's tracking the combined total size of the entire Jenkins clusters across all the installations. The use cases for the stats graphs we have today pretty much come down to: - "I want to feel good looking at a graph that's growing up, up and up" - "I want to see the latest Jenkins installation counts / node counts / job counts /... How do I do that?" - "I want to put this chart in my slide" (so I'd rather want a CSV file because I know how to plot a graph in Excel and make it look the way I want) The first use case is the one that we handle very well today :-), but other use cases, not so much. > - The new UI can be developed which will be easy to understand, to > view the statistics of the usage. For example to show the popularity of the > plugins in a given time period, we can only show top 20 plugins, to > increase the visibility in piechart or bar graph. > > If you are interested in looking at the data in other ways, which is great, then we should think first about what kind of questions we want stats to answer. I'm sure many people have tons of questions, some of mine are: - How many of our users are running Windows, and how many are Linux? Of Linux, what are the percentage of Debian family vs RHEL family? - What's the distribution of cluster size? Did it change over time? Does the age of the installation corelate to the cluster size? If so what does that curve look like? - How quickly/often are people upgrading? Is there a popular release and unpopular release? Can we spot downgrades? Do they correlate to the perceived quality of the releases (see community rating in here <https://jenkins-ci.org/changelog/>)? Can we use it to warn us if a release seems to be unpopular? These are all harder questions to answer than what the stats page show today, but I think those are more technically interesting for you, and in the scope of GSoC I think it's quite adequate. If you haven't been a Jenkins user and find it difficult to get your head around what kind of questions we want statistics to answer, maybe the way to go is to go one level meta and find a way to make this data available for adhoc queries, so that people with interesting questions can query these data by some generic language, ala Apache Pig. > We can maintain duration window(can be adjusted by the user) and then show > stats in that time frame rather than showing the monthly data. Similar idea > related to processing is already there on wiki page of the project. I will > elaborate all these points in detail in my proposal. > This would be a great help. - Jenkins plugin dependency graph [link > <http://stats.jenkins-ci.org/jenkins-stats/jenkinsgraph.html?filter=kohsuke> > ] > > - I think dependency graph can be exploited to tell what plugins are > more likely used together. Can someone please clarify that what exactly > dependency denotes here ? And, where can I find the source code for this. > > Here <https://github.com/jenkinsci/infra-statistics> > Census data [link <https://jenkins-ci.org/census/>] > > - What is the metadata/fields for these json files? > > Yeah we should document this. I or Daniel will get back to you on that one. You'll also want to know the sense of the data set size. > Repo for the current sources [link > <https://github.com/jenkinsci/infra-statistics>]. > > - Which languages other than groovy can be used by candidates for this > project? > > Java or Groovy would be preferred. That way, we have more people who can work after GSoC ends. > So, after going through all these links, what should my next step to make > contribution to this project ? > I believe we need to drive toward your creating a project plan that you'll then submit to GSoC. It sounds to me like you still need to get yourself oriented in what exists, and probably learn a bit of about Jenkins --- what it does, who uses it, that sort of things. I think that'll help you think about what are the interesting questions we are trying to answer by using data mining. If you want to hear more brainstorming from me or others, we are happy to provide one. In parallel, we'd like to hear from you some specific space you want to take on --- "usage stat analysis" is still too big and vague. There's upcoming student office hours that you might be interested, too. Looking forward to suggestions. > > Thanks a lot. > > Regards, > Payal Priyadarshini > > -- > You received this message because you are subscribed to the Google Groups > "Jenkins Developers" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/jenkinsci-dev/5550ea02-93b7-45ca-a992-2149311ec881%40googlegroups.com > <https://groups.google.com/d/msgid/jenkinsci-dev/5550ea02-93b7-45ca-a992-2149311ec881%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- Kohsuke Kawaguchi -- You received this message because you are subscribed to the Google Groups "Jenkins Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/CAN4CQ4zE4az%2BA1e3ojHznJ1ir3FYC9p5RhoF%3Dig6eO4xK3bi0Q%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
