Hey, first of all, thanks a lot for your, Uwes, the mergers and contributors work. Now, to the maintainer problem:
# Arrow as "a library" One thing that makes Arrow special is that it is not a single, but many libraries (one for each language) and many of them are not only a binding to a C/C++ lib, but partly a complete re-implementation of the protocol, e.g.: - C++: one core, but also contains Python specialties - Java: another core - Rust: yet another core - Python: a binding to C++ but also a lot more stuff because of Pandas ... And you two are maintaining all of them and I doubt that you have the capacities and knowledge to do this at the desired level of quality (which is natural, not a personal issue or offense). So this I would call "pseudo-maintenance", since you're solely the gatekeeper that does some shallow reviewing and has the burden to do the housekeeping and the merging. So why accepting these language bindings in the first place without bringing a core maintainer in place? For example, let's say someone proposes a binding to Haskell now. That should not be accepted as part of the official Apache implementation without a dedicated maintainer (ideally the PR-author would be that person, but there may others who step up). Right now, it might be too late to remove some of the incomplete / WIP implementations that don't have a core maintainer though. # GitHub Another special thing to consider is that Arrow is (ab)using GitHub as a code hosting platform. Even as a contributor, this has obvious bad uncool consequences: - you have yet another issue hosting system to log in - links to issues don't work in the known magic way - you're merging the PRs by closing them; which is by all means a not very nice way because it does not reflect the contributors work in the project overview and personal profiles, but exactly this is a large part of the GitHub community (btw: merging PRs without using GitHubs merge button IS possible as bors/bors-ng proof) So as a potential maintainer, this is already a bumper, since I know that there are things less confortable then the system I would get from any normal GitHub or Gitlab project. I'm not really sure how to solve this or if it should be solved (read about the laziness aspect in "Contribution VS Maintenance" below) # Time / Payment Yes, this is indeed a big issue. From what I can tell from the open source projects I was involved in is that for large contributor crowds, you normally have full/half-time positions in place for the core maintainer (look at the Mozilla projects, the Blender Foundation, Gnome / Red Hat). So at one point I think maintaining isn't a part time / hobby thing anymore (w/o downgrading the hard work of Hobby- contributors, in contrast). I don't have a link at hand, but I recall some discussion about GitHub and it's importance for hiring (since it it acts as a CV) after MS bought it, and some of the responses are "doing all this work in your free time is a privilege of wealthy, mostly-white men", which without signing this statement in this really bare form already shows a problem of open source world. # Contribution VS Maintenance The very "nice" thing about patch/PR contribution is that you do your work and then you can walk away and it's the maintainers problem to release the artifact, upgrade/migrate your code and ensure that the tests you've written never break. It's comfortable. Being a maintainer means all the opposite things. And in the end, you get blamed for not supporting certain features (see the open source paragraph here https:/ /blog.ghost.org/5/ ) or for security disasters (remember the OpenSSL disaster). I think together with the previous point this means, we have to get companies to pay for that work, and not just dump their features to an OSS repo. # Path to Maintainership So I think (from my narrow point of view!) that many people expect that the path from "outsider" to "maintainer" takes the route over "a lot of patch/PR contributions". If I'm reading your mail right, that is not necessarily the case for Apache projects and I think that's great. The "review PRs" path sounds great, but I think GitHub or any platform I'm aware don't do a good job in getting people to do so. I mean, I see a PR and a can leave a review, but for me it is not really clear which consequences this have (naturally, random people don't have a veto on changes). So I can jump in when I think something is wrong, but I cannot approve a PR. This makes sense, but it poses the question of "how?!". I mean, it is pretty clear on how to become a patch/PR contributor, but it is not clear on how to become a maintainer, at least not in an easy way. (I'm sure it's written down somewhere). So, overall I think a clear Call for Action at the top of the README could help. Like "Hey, we're looking for maintainers, you could start by reviewing some PRs and after some reviews maintainers will just be the last gatekeeper and after some more time, you can even merge PRs on your own". # My personal contribution Triggered by this call for help, I'll try to get more involved in Python, C++ and Rust reviews. So, these are some thoughts that I hope may help. Thanks again for addressing this issue and your time and passion, Marco On 2018/06/30 14:57:42, Wes McKinney <w...@gmail.com> wrote: > hi folks,> > > Arrow has grown by leaps and bounds over the last 2.5 years. We are> > approaching our 2000th patch and on track to surpass 200 unique> > contributors by year end.> > > All this contribution growth is great, but it has a hidden cost: the> > maintenance. The burden of maintaining the project: particularly> > reviewing and merging patches, has fallen on a very small number of> > people. From the commit logs, we can see how many patches each> > committer has merged:> > > $ git shortlog -csn d5aa7c46692474376a3c31704cfc4783c86338f2..master> > 1289 Wes McKinney> > 268 Uwe L. Korn> > 74 Korn, Uwe> > 54 Antoine Pitrou> > 52 Julien Le Dem> > 39 Philipp Moritz> > 18 Kouhei Sutou> > 18 Steven Phillips> > 13 Bryan Cutler> > 11 Jacques Nadeau> > 10 Phillip Cloud> > 8 Brian Hulette> > 5 Robert Nishihara> > 5 adeneche> > 4 GitHub> > 3 Sidd> > 3 siddharth> > 1 AbdelHakim Deneche> > 1 Your Name Here> > > So Uwe and I have merged ~84% of the patches in the project so far.> > This isn't a completely accurate reflection of the maintainer burden,> > since many others contribute to code reviews and other aspects of> > patch maintenance, and you have to be a committer to earn a place on> > this list.> > > I'm not sure what's the best way to address this problem. The quality> > of our code review has declined at times as we struggle to keep up> > with the flow of patches -- I don't think this is good. Having the> > patch queue pile up isn't great either. Personally, I'm having a> > difficult time balancing project maintenance and patch authoring,> > particularly in the last 6 months.> > > Unfortunately, many people believe that writing patches is the primary> > mode of contribution to an open source project. Apache projects> > explicitly state that non-patch contributions are valued in earning> > karma (committership and PMC membership). We're starting to have more> > corporate contributors come out of the woodwork, and while it's great> > for contributors to be paid to write patches for the project, they are> > rarely given the time and space to contribute meaningfully to> > maintenance.> > > Any thoughts about how we can grow the maintainership? Somehow we need> > to reach ~5-6 core maintainers over the next year.> > > Thanks,> > Wes> >