hi all, I have been spending both compensated (inside the bounds of 40 hour work weeks) and uncompensated time (work beyond 40 hours per week) working on Apache Arrow since the project started. In that time I have changed corporate affiliations multiple times, and I have made career decisions on the basis of obtaining ongoing support for Arrow development.
I think it could be a good idea to bring better transparency into support that corporations are providing by allowing employees to contribute to the Arrow project without having to spend uncompensated time doing so. If individuals are contributing on their own time, that is also useful information to have. In today's age of tenuous sustainability for large open source projects, I think it is helpful to make a positive example of / recognize companies that are investing time and money to allow individuals to contribute to this project. Open source has a significant "free loader" problem, and we are already starting to occasionally experience free-loading behavior wherein individuals or corporations treat this project as a source of free labor. Building a project like Apache Arrow is difficult, because, by providing an open standard columnar memory format and a development platform for doing many other things, we are enabling downstream applications to solve problems in new and valuable ways. While such users of Arrow may derive an economic benefit, it is difficult to measure and even more difficult to judge how much to give back. As time goes on, we will be increasingly reliant on proactive investment and support in the maintenance and growth of this project, otherwise in the long run we may be doomed to the "tragedy of the commons", and no one wants that. Ultimately Apache projects are about individuals contributing to the projects of their own free will, but we are frequently dependent on financial support so that individuals can afford to contribute. Any thoughts about what we could do? I was thinking about having a page on the Arrow website showing top individual contributors, top "maintainers" (by # of patches merged; I wonder if it is possible to scrape code review analytics), and top corporate sponsors by number of supported patches. To implement the latter, we would need to depend on data provided by contributors to state their affiliations and the effective date of such affiliation so that it can be updated in the "database". For example, I would have entries such as: - name: Wes McKinney affiliation: [Cloudera] effective_date: "2016-01-01" - name: Wes McKinney affiliation: [Two Sigma] effective_date: "2016-08-26" - name: Wes McKinney affiliation: [Ursa Labs, RStudio] effective_date: "2018-04-17" The analytics on the changelog could be implemented with a simple Python script. Corporations could opt-out of having their contributions attributed. Thanks, Wes