Look at how Apache Flink is doing the reporting on the community:
https://flink.apache.org/news/2015/12/18/a-year-in-review.html

Maybe we can learn from this.

Best regards,

Pierre Smits

ORRTIZ.COM <http://www.orrtiz.com>
OFBiz based solutions & services

OFBiz Extensions Marketplace
http://oem.ofbizci.net/oci-2/

On Thu, Mar 31, 2016 at 10:40 AM, Pierre Smits <[email protected]>
wrote:

> Hi Carol, all,
>
> You are right, numbers without context mean nothing. It is all about
> correlation. Yet, one must start to measure first before the insights can
> be created. But it must not be the end goal. It must all be seen in
> relation to adoption, community growth and health.
>
> Best regards,
>
> Pierre Smits
>
> ORRTIZ.COM <http://www.orrtiz.com>
> OFBiz based solutions & services
>
> OFBiz Extensions Marketplace
> http://oem.ofbizci.net/oci-2/
>
> On Wed, Mar 30, 2016 at 10:13 AM, Carol Pearson <
> [email protected]> wrote:
>
>> Hi,
>>
>> I've looked at a bunch of things to get a handle on our users, growth, and
>> what some other Apache projects have for committers and community.
>> Trafodion is a database project, so I went looking for real data,
>> everything from participation on our email lists (and new posts there) to
>> Jira activity to Github forks and pulls and commits.  I also monitor some
>> more fanciful stats, looking for references to Trafodion on Twitter,
>> stackoverflow, etc.
>>
>> As far as email list activity goes, I use the data from the mailing list
>> archive.  The user list was very quiet (fewer than 20 emails total from
>> when Trafodion started incubating through December.  That's not very
>> inviting - our users who drove by to check us out didn't see much
>> activity,
>> even though there was a lot.  So I don't pay too much attention to data in
>> that range.  Our user list has shown a big jump in usage in that period,
>> slightly cannibalizing the dev list.
>>
>> Here are the numbers I have for Jan/Feb/March.  Sorry for the funky ascii
>> formatting, but mailing lists don't do attachments and tables very well:
>>
>> User List:
>>
>> MON        Total Posts       Distinct         Non-Esgyn
>>                                        Posters           Posters
>> ======================================
>> JAN2016         19                  12                         2
>> FEB2016        291                 42                         6
>> MAR2016        126                25                         1
>>
>> Dev List:
>>
>> MON        Total Posts       Distinct         Non-Esgyn
>>                                        Posters           Posters
>> ======================================
>> DEC2016        243                 29                       6
>> JAN2016        199                  24                       3
>> FEB2016        181                  24                       4
>> MAR2016        200                 31                       4
>>
>>
>> Note that Dec2016 was a release month and the Non-Esgyn posters were
>> mostly
>> IPMC posters helping guide our release with respect to things like
>> licensing guidance.
>>
>> So we're seeing some additional participation but it's still heavily
>> dominated by Esgyn.
>>
>> I count distinct posters by email address, so posters that use two
>> different emails count twice.
>>
>> We have google analytics on the newly-redesigned website.  It shows
>> similar
>> numbers of hits between new users and returning users, but I'm not sure
>> how
>> significant that is, since many returning users from Esgyn don't need to
>> re-hit the website.
>>
>> Still, data is data, and here's a sample for the period from 29Feb through
>> today, 29Mar:
>>
>> Metric                  New User   Returning User    Total
>> ========================================
>> Sessions                 885             895                1780
>> %New Sessions      100%           0%                49.72%
>> Bounce Rate           60%            48.83%           54.38%
>> Pages/Session        2.09             2.39               2.24
>> Avg Session            02:01           02:57             02:29
>>    Duration
>>
>> And so on.
>>
>>
>> But one thing I've learned over the years is that numbers are just....
>> numbers.  These are nice (and I have plenty more), but the real question
>> is, "what's a good score?"  What's typical for Apache projects for
>> committer distribution? What's typical for user list activity?
>>
>> I started with the first question: Where do committers come from and
>> what's
>> their distribution?  I used the Apache committer lists and the websites
>> that indicated committer affiliation. This wasn't perfect:  Some project
>> don't have committer affiliation; I can't trust others to be perfectly
>> up-to-date.  Further, it doesn't indicate committer activity. Still, it
>> gives some targets.
>>
>> After I started, I refined the data a little bit by looking for projects
>> similar to Trafodion along a couple of possible vectors:  data management
>> or Hadoop/Big Data ecosystem and recently graduated.  The latter category
>> is particularly interesting to me because I would expect more diversity of
>> committers over time, if only because developers move around.
>>
>> I was not able to collect data on currently incubating projects because
>> the
>> list of committers I worked from on ASF did not include incubating
>> projects
>> in the phonebook, though the reports have them and many project websites
>> have them.  I was more interested in projects that climbed the mountain
>> we're trying to climb:
>>
>> Here's some of the data I collected back in February
>>
>> Trafodion:
>> ORG       Count   Pct
>> ================
>> Esgyn         10     66.67%
>> orrtiz.com     1     06.67
>> Unvailable    4       28%
>>   /Inactive
>> Total           15
>>
>> HBase:
>> ========================
>> Cloudera 12 26%
>> Continuuity  1 2%
>> Dropbox 1 2%
>> Explorys 1  2%
>> Facebook 9  19%
>> Hortonworks  7  15%
>> IBM  1  2%
>> Intel 2  4%
>> Salesforce.com 3 6%
>> Scaled Risk 1 2%
>> Taobao 1 2%
>> unaffiliated 1 2%
>> WANdisco 1 2%
>> Xiaomi 4 9%
>> Yahoo! 1 2%
>> Yuantiku 1 2%
>>
>>
>> Formatting this is getting crazy and it's getting late since I was up
>> early
>> travelling. I'll just C&P and my apologies for the alignment
>>
>> Ignite:  Graduated Sept 2015
>> ChronoTrack 1 4%
>> CyberAgent, Inc. 1 4%
>> Engiweb Security 1 4%
>> Evosent Consulting 1 4%
>> Fitech Source 1 4%
>> GridGain 14 58%
>> Pivotal 1 4%
>> Shoutlet 1 4%
>> Trend Micro 1 4%
>> WANdisco 2 8%
>> Grand Total 24
>>
>> Calcite:  Graduated Nov 2015
>> Dremio 1 7%
>> Hortonworks 7 47%
>> Intel 1 7%
>> MapR 3 20%
>> NetCracker 1 7%
>> NGData 1 7%
>> Salesforce 1 7%
>> Grand Total 15
>>
>> Or
>>
>> Count
>>
>> Spark:
>>
>> Alibaba 1 2%
>>
>> Bizo 1 2%
>>
>> ClearStory Data 1 2%
>>
>> Cloudera 4 9%
>>
>> Databricks 15 34%
>>
>> Databricks, MIT 1 2%
>>
>> Facebook 1 2%
>>
>> Hortonworks 1 2%
>>
>> IBM 1 2%
>>
>> Intel 2 5%
>>
>> Mxit 1 2%
>>
>> Netflix 1 2%
>>
>> NTT Data 1 2%
>>
>> Quantifind 1 2%
>>
>> QuestTec B.V. 1 2%
>>
>> Tachyon Nexus 1 2%
>>
>> UC Berkeley 5 11%
>>
>> University of Michigan, Ann Arbor 1 2%
>>
>> Webtrends 1 2%
>>
>> Yahoo! 3 7%
>>
>> Grand Total 44
>>
>>
>>
>> I have a spreadsheet with a bunch more companies. I'll send it to anyone
>> who
>>
>> asks - the data was all gleaned publicly.
>>
>>
>> Anyway, the upshot from what I saw was that even recently graduated
>> projects
>>
>> had 50-60% at most of active committers from one company (and I would
>> guess
>>
>> are moving away from that as apart of the apache way.
>>
>>
>>
>> I have a spreadsheet that I'm happy to send to anyone who wants it - the
>> data was all gleaned publicly.
>>
>> The upshot from what I saw was that even recently graduated projects are
>> typically in the 50-60% range of committers from a single company. The
>> largest percent I saw was 76% on the Ambari project.
>>
>> So that's some of the user data/growth data I have.  Apparently, I'm more
>> of a data junky than I thought....
>>
>> -Carol P.
>>
>>
>> ---------------------------------------------------------------
>> Email:    [email protected]
>> Twitter:  @CarolP222
>> ---------------------------------------------------------------
>>
>> On Tue, Mar 29, 2016 at 6:57 PM, Andrew Purtell <[email protected]>
>> wrote:
>>
>> > On Tue, Mar 29, 2016 at 10:01 AM, Pierre Smits <[email protected]>
>> > wrote:
>> >
>> > > A
>> > > distribution with Apache only elements (Hadoop, HBase, Zookeeper,
>> Ambari,
>> > > etc) would surely be a nice-to-have, and also a means to show
>> > cross-selling
>> > > Apache products that could lead to cross-pollination (adoption and
>> > > community growth wise).
>> > >
>> >
>> > ​That's known as Apache Bigtop. ​
>> >
>> >
>> >
>> > --
>> > Best regards,
>> >
>> >    - Andy
>> >
>> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
>> > (via Tom White)
>> >
>>
>
>

Reply via email to