I'm not proposing to summarize or display contributions based on LOC changes -- this is one of the worst metrics that I know of. To give an extreme example, take a look at the contribution graph for Apache ORC:
https://github.com/apache/orc/graphs/contributors I don't think you can infer anything about the "magnitude" of contributions when you're dealing with vendored code, moving things around etc. I do think that summarizing number of contributed patches is useful, and helps acknowledge the general level of investment of individual and their corporate patrons. I estimate that there has already been an effective net investment in Arrow somewhere between $2-5 million dollars; and to the extent that we can recognize this investment in the ecosystem's common good (since Arrow truly is a "common good" project more than many others) it could encourage increased investment in time. If we determine that contributors are making contributions for the purpose of "resume padding" I think we can deal with that on a case by case basis. - Wes On Thu, Aug 16, 2018 at 4:37 AM, Uwe L. Korn <uw...@xhochy.com> wrote: > What about separating committers and companies? We could have a section > listing all committers as we currently do and have a separate listing of all > companies that employed a committer while they were contributing. > > This will give individuals and companies attribution but does not make a big > matrix of comparison between companies. The entry barrier for both would be > to have been voted a committer. Thus single 100k loc changes for reformatting > will not have an impact on this. > > Uwe > >> Am 16.08.2018 um 10:02 schrieb Julian Hyde <jh...@apache.org>: >> >> This is a tough one. I think we need to strike a delicate balance: we >> should thank companies for being benefactors, but should not put up >> with bragging (or as Ted puts it, genital comparisons). >> >> In Calcite, we allow committers to show their company affiliations[1]. >> I was initially concerned, but it has turned out well: it illustrates >> that our committers come from a diverse set of employers. (Which >> reminds me... I need to update my affiliation in that table.) >> >> I think we should allow committers to give their employers due credit. >> But if companies start to abuse this, we are in control. We can remove >> the credit from the site. >> >> Julian >> >> [1] https://calcite.apache.org/community/#project-members >>> On Thu, Aug 16, 2018 at 12:08 AM Ted Dunning <ted.dunn...@gmail.com> wrote: >>> >>> Yes, there are several such examples. And it turned into a monstrous mess >>> with companies bragging over lines of code changed. Oddly, the guys who did >>> lots of reformatting did really well. >>> >>> There is also the problem of the very strong Apache tradition that it is >>> individuals who contribute to projects, not companies. The most that >>> companies get is a thank you. >>> >>> I think that wes is right on that the commons is a serious threat, but >>> restarting the Hadoop genital comparisons is not a great course of action. >>> >>>> On Wed, Aug 15, 2018, 19:33 Abdul Rahman <abdulrahman...@outlook.com> >>>> wrote: >>>> >>>> Are there examples of other larger Apache projects that have done this? I >>>> am assuming this should happen rather frequently given the large number of >>>> popular Apache projects (or just any other Open Source project) >>>> >>>> Get Outlook for Android<https://aka.ms/ghei36> >>>> >>>> ________________________________ >>>> From: Wes McKinney <wesmck...@gmail.com> >>>> Sent: Wednesday, August 15, 2018 4:18:24 PM >>>> To: dev@arrow.apache.org >>>> Subject: Increasing transparency of corporate support for Apache Arrow >>>> development >>>> >>>> hi all, >>>> >>>> I have been spending both compensated (inside the bounds of 40 hour >>>> work weeks) and uncompensated time (work beyond 40 hours per week) >>>> working on Apache Arrow since the project started. In that time I have >>>> changed corporate affiliations multiple times, and I have made career >>>> decisions on the basis of obtaining ongoing support for Arrow >>>> development. >>>> >>>> I think it could be a good idea to bring better transparency into >>>> support that corporations are providing by allowing employees to >>>> contribute to the Arrow project without having to spend uncompensated >>>> time doing so. If individuals are contributing on their own time, that >>>> is also useful information to have. >>>> >>>> In today's age of tenuous sustainability for large open source >>>> projects, I think it is helpful to make a positive example of / >>>> recognize companies that are investing time and money to allow >>>> individuals to contribute to this project. Open source has a >>>> significant "free loader" problem, and we are already starting to >>>> occasionally experience free-loading behavior wherein individuals or >>>> corporations treat this project as a source of free labor. >>>> >>>> Building a project like Apache Arrow is difficult, because, by >>>> providing an open standard columnar memory format and a development >>>> platform for doing many other things, we are enabling downstream >>>> applications to solve problems in new and valuable ways. While such >>>> users of Arrow may derive an economic benefit, it is difficult to >>>> measure and even more difficult to judge how much to give back. As >>>> time goes on, we will be increasingly reliant on proactive investment >>>> and support in the maintenance and growth of this project, otherwise >>>> in the long run we may be doomed to the "tragedy of the commons", and >>>> no one wants that. >>>> >>>> Ultimately Apache projects are about individuals contributing to the >>>> projects of their own free will, but we are frequently dependent on >>>> financial support so that individuals can afford to contribute. >>>> >>>> Any thoughts about what we could do? I was thinking about having a >>>> page on the Arrow website showing top individual contributors, top >>>> "maintainers" (by # of patches merged; I wonder if it is possible to >>>> scrape code review analytics), and top corporate sponsors by number of >>>> supported patches. To implement the latter, we would need to depend on >>>> data provided by contributors to state their affiliations and the >>>> effective date of such affiliation so that it can be updated in the >>>> "database". >>>> >>>> For example, I would have entries such as: >>>> >>>> - name: Wes McKinney >>>> affiliation: [Cloudera] >>>> effective_date: "2016-01-01" >>>> - name: Wes McKinney >>>> affiliation: [Two Sigma] >>>> effective_date: "2016-08-26" >>>> - name: Wes McKinney >>>> affiliation: [Ursa Labs, RStudio] >>>> effective_date: "2018-04-17" >>>> >>>> The analytics on the changelog could be implemented with a simple >>>> Python script. Corporations could opt-out of having their >>>> contributions attributed. >>>> >>>> Thanks, >>>> Wes >>>> >