Increasing transparency of corporate support for Apache Arrow development

Wes McKinney Wed, 15 Aug 2018 16:19:44 -0700

hi all,

I have been spending both compensated (inside the bounds of 40 hour
work weeks) and uncompensated time (work beyond 40 hours per week)
working on Apache Arrow since the project started. In that time I have
changed corporate affiliations multiple times, and I have made career
decisions on the basis of obtaining ongoing support for Arrow
development.


I think it could be a good idea to bring better transparency into
support that corporations are providing by allowing employees to
contribute to the Arrow project without having to spend uncompensated
time doing so. If individuals are contributing on their own time, that
is also useful information to have.

In today's age of tenuous sustainability for large open source
projects, I think it is helpful to make a positive example of /
recognize companies that are investing time and money to allow
individuals to contribute to this project. Open source has a
significant "free loader" problem, and we are already starting to
occasionally experience free-loading behavior wherein individuals or
corporations treat this project as a source of free labor.

Building a project like Apache Arrow is difficult, because, by
providing an open standard columnar memory format and a development
platform for doing many other things, we are enabling downstream
applications to solve problems in new and valuable ways. While such
users of Arrow may derive an economic benefit, it is difficult to
measure and even more difficult to judge how much to give back. As
time goes on, we will be increasingly reliant on proactive investment
and support in the maintenance and growth of this project, otherwise
in the long run we may be doomed to the "tragedy of the commons", and
no one wants that.

Ultimately Apache projects are about individuals contributing to the
projects of their own free will, but we are frequently dependent on
financial support so that individuals can afford to contribute.

Any thoughts about what we could do? I was thinking about having a
page on the Arrow website showing top individual contributors, top
"maintainers" (by # of patches merged; I wonder if it is possible to
scrape code review analytics), and top corporate sponsors by number of
supported patches. To implement the latter, we would need to depend on
data provided by contributors to state their affiliations and the
effective date of such affiliation so that it can be updated in the
"database".

For example, I would have entries such as:

- name: Wes McKinney
  affiliation: [Cloudera]
  effective_date: "2016-01-01"
- name: Wes McKinney
  affiliation: [Two Sigma]
  effective_date: "2016-08-26"
- name: Wes McKinney
  affiliation: [Ursa Labs, RStudio]
  effective_date: "2018-04-17"

The analytics on the changelog could be implemented with a simple
Python script. Corporations could opt-out of having their
contributions attributed.

Thanks,
Wes

Increasing transparency of corporate support for Apache Arrow development

Reply via email to