I'm not proposing to summarize or display contributions based on LOC
changes -- this is one of the worst metrics that I know of. To give an
extreme example, take a look at the contribution graph for Apache ORC:

https://github.com/apache/orc/graphs/contributors

I don't think you can infer anything about the "magnitude" of
contributions when you're dealing with vendored code, moving things
around etc.

I do think that summarizing number of contributed patches is useful,
and helps acknowledge the general level of investment of individual
and their corporate patrons. I estimate that there has already been an
effective net investment in Arrow somewhere between $2-5 million
dollars; and to the extent that we can recognize this investment in
the ecosystem's common good (since Arrow truly is a "common good"
project more than many others) it could encourage increased investment
in time.

If we determine that contributors are making contributions for the
purpose of "resume padding" I think we can deal with that on a case by
case basis.

- Wes

On Thu, Aug 16, 2018 at 4:37 AM, Uwe L. Korn <uw...@xhochy.com> wrote:
> What about separating committers and companies? We could have a section 
> listing all committers as we currently do and have a separate listing of all 
> companies that employed a committer while they were contributing.
>
> This will give individuals and companies attribution but does not make a big 
> matrix of comparison between companies. The entry barrier for both would be 
> to have been voted a committer. Thus single 100k loc changes for reformatting 
> will not have an impact on this.
>
> Uwe
>
>> Am 16.08.2018 um 10:02 schrieb Julian Hyde <jh...@apache.org>:
>>
>> This is a tough one. I think we need to strike a delicate balance: we
>> should thank companies for being benefactors, but should not put up
>> with bragging (or as Ted puts it, genital comparisons).
>>
>> In Calcite, we allow committers to show their company affiliations[1].
>> I was initially concerned, but it has turned out well: it illustrates
>> that our committers come from a diverse set of employers. (Which
>> reminds me... I need to update my affiliation in that table.)
>>
>> I think we should allow committers to give their employers due credit.
>> But if companies start to abuse this, we are in control. We can remove
>> the credit from the site.
>>
>> Julian
>>
>> [1] https://calcite.apache.org/community/#project-members
>>> On Thu, Aug 16, 2018 at 12:08 AM Ted Dunning <ted.dunn...@gmail.com> wrote:
>>>
>>> Yes, there are several such examples. And it turned into a monstrous mess
>>> with companies bragging over lines of code changed. Oddly, the guys who did
>>> lots of reformatting did really well.
>>>
>>> There is also the problem of the very strong Apache tradition that it is
>>> individuals who contribute to projects, not companies. The most that
>>> companies get is a thank you.
>>>
>>> I think that wes is right on that the commons is a serious threat, but
>>> restarting the Hadoop genital comparisons is not a great course of action.
>>>
>>>> On Wed, Aug 15, 2018, 19:33 Abdul Rahman <abdulrahman...@outlook.com> 
>>>> wrote:
>>>>
>>>> Are there examples of  other larger Apache projects that have done this? I
>>>> am assuming this should happen rather frequently given the large number of
>>>> popular Apache projects (or just any other Open Source project)
>>>>
>>>> Get Outlook for Android<https://aka.ms/ghei36>
>>>>
>>>> ________________________________
>>>> From: Wes McKinney <wesmck...@gmail.com>
>>>> Sent: Wednesday, August 15, 2018 4:18:24 PM
>>>> To: dev@arrow.apache.org
>>>> Subject: Increasing transparency of corporate support for Apache Arrow
>>>> development
>>>>
>>>> hi all,
>>>>
>>>> I have been spending both compensated (inside the bounds of 40 hour
>>>> work weeks) and uncompensated time (work beyond 40 hours per week)
>>>> working on Apache Arrow since the project started. In that time I have
>>>> changed corporate affiliations multiple times, and I have made career
>>>> decisions on the basis of obtaining ongoing support for Arrow
>>>> development.
>>>>
>>>> I think it could be a good idea to bring better transparency into
>>>> support that corporations are providing by allowing employees to
>>>> contribute to the Arrow project without having to spend uncompensated
>>>> time doing so. If individuals are contributing on their own time, that
>>>> is also useful information to have.
>>>>
>>>> In today's age of tenuous sustainability for large open source
>>>> projects, I think it is helpful to make a positive example of /
>>>> recognize companies that are investing time and money to allow
>>>> individuals to contribute to this project. Open source has a
>>>> significant "free loader" problem, and we are already starting to
>>>> occasionally experience free-loading behavior wherein individuals or
>>>> corporations treat this project as a source of free labor.
>>>>
>>>> Building a project like Apache Arrow is difficult, because, by
>>>> providing an open standard columnar memory format and a development
>>>> platform for doing many other things, we are enabling downstream
>>>> applications to solve problems in new and valuable ways. While such
>>>> users of Arrow may derive an economic benefit, it is difficult to
>>>> measure and even more difficult to judge how much to give back. As
>>>> time goes on, we will be increasingly reliant on proactive investment
>>>> and support in the maintenance and growth of this project, otherwise
>>>> in the long run we may be doomed to the "tragedy of the commons", and
>>>> no one wants that.
>>>>
>>>> Ultimately Apache projects are about individuals contributing to the
>>>> projects of their own free will, but we are frequently dependent on
>>>> financial support so that individuals can afford to contribute.
>>>>
>>>> Any thoughts about what we could do? I was thinking about having a
>>>> page on the Arrow website showing top individual contributors, top
>>>> "maintainers" (by # of patches merged; I wonder if it is possible to
>>>> scrape code review analytics), and top corporate sponsors by number of
>>>> supported patches. To implement the latter, we would need to depend on
>>>> data provided by contributors to state their affiliations and the
>>>> effective date of such affiliation so that it can be updated in the
>>>> "database".
>>>>
>>>> For example, I would have entries such as:
>>>>
>>>> - name: Wes McKinney
>>>>  affiliation: [Cloudera]
>>>>  effective_date: "2016-01-01"
>>>> - name: Wes McKinney
>>>>  affiliation: [Two Sigma]
>>>>  effective_date: "2016-08-26"
>>>> - name: Wes McKinney
>>>>  affiliation: [Ursa Labs, RStudio]
>>>>  effective_date: "2018-04-17"
>>>>
>>>> The analytics on the changelog could be implemented with a simple
>>>> Python script. Corporations could opt-out of having their
>>>> contributions attributed.
>>>>
>>>> Thanks,
>>>> Wes
>>>>
>

Reply via email to