hi Marco, some comments inline
On Sat, Jun 30, 2018 at 2:15 PM, Marco Neumann <ma...@crepererum.net.invalid> wrote: > Hey, > > first of all, thanks a lot for your, Uwes, the mergers and contributors > work. Now, to the maintainer problem: > > # Arrow as "a library" > One thing that makes Arrow special is that it is not a single, but many > libraries (one for each language) and many of them are not only a > binding to a C/C++ lib, but partly a complete re-implementation of the > protocol, e.g.: > > - C++: one core, but also contains Python specialties > - Java: another core > - Rust: yet another core > - Python: a binding to C++ but also a lot more stuff because of Pandas > ... > > And you two are maintaining all of them and I doubt that you have the > capacities and knowledge to do this at the desired level of quality > (which is natural, not a personal issue or offense). So this I would > call "pseudo-maintenance", since you're solely the gatekeeper that does > some shallow reviewing and has the burden to do the housekeeping and > the merging. So why accepting these language bindings in the first > place without bringing a core maintainer in place? For example, let's > say someone proposes a binding to Haskell now. That should not be > accepted as part of the official Apache implementation without a > dedicated maintainer (ideally the PR-author would be that person, but > there may others who step up). The most development activity, and where we have the most need of help, is in C++ and Python. The other area is in dev/CI infrastructure and release management. We're falling behind on implementation and design work involving Java-land (I have been trying for about a year to hammer down an improved Interval type), but that's a separate problem. We are about to reach a point (particularly if Gandiva becomes part of Apache Arrow) where more languages will become dependent on the C++ library. This makes the need for more C++ maintainers even more urgent. I think the other libraries have done a good job of self-managing their code (e.g. Java, JavaScript), and I frequently merge patches when there is a +1 or some other consensus. > > Right now, it might be too late to remove some of the incomplete / WIP > implementations that don't have a core maintainer though. Honestly, the incomplete/WIP projects are not causing any maintenance burden. It's the main projects and their development lifecycle that is creating a lot of work. > > # GitHub > Another special thing to consider is that Arrow is (ab)using GitHub as > a code hosting platform. Even as a contributor, this has obvious bad > uncool consequences: I think these issues are red herrings. If maintainers are more motivated by the gamification of their open source contributions rather than the health and success of the proejct, I really question how valuable of a maintainer they are. > > - you have yet another issue hosting system to log in I strongly dispute the notion that using JIRA is a deterrent to maintainers. If anyone, it's a filter for drive-by contributors and unserious maintainers. I say this as the project's primary JIRA gardener. > - there is yet another information channel to keep track of (this ML > for example, which has a semi-informative web interface telling you > can only login using Google but does not tell you how to subscribe to > the list) > - links to issues don't work in the known magic way I think these things might deter passers-by, but I don't see why they would be a problem for someone who is concerned with the health of the project. As the primary maintainer of the project, these things don't impact me in any way. > - you're merging the PRs by closing them; which is by all means a not > very nice way because it does not reflect the contributors work in > the project overview and personal profiles, but exactly this is a > large part of the GitHub community (btw: merging PRs without using > GitHubs merge button IS possible as bors/bors-ng proof) For each patch you contribute, you get one contribution "point" on GitHub, but it won't show that you have a PR "merged". I don't see why we should have to comply with GitHub's gamified approach to open source. > > So as a potential maintainer, this is already a bumper, since I know > that there are things less confortable then the system I would get from > any normal GitHub or Gitlab project. > > I'm not really sure how to solve this or if it should be solved (read > about the laziness aspect in "Contribution VS Maintenance" below) I don't mean to be too dismissive of these concerns (they are common; people have a difficult time with change) -- I've been long critical of people concerned with their "GitHub High Score". See some writing on this from a while ago: http://wesmckinney.com/blog/github-open-source-contributions/ > > # Time / Payment > Yes, this is indeed a big issue. From what I can tell from the open > source projects I was involved in is that for large contributor crowds, > you normally have full/half-time positions in place for the core > maintainer (look at the Mozilla projects, the Blender Foundation, Gnome > / Red Hat). So at one point I think maintaining isn't a part time / > hobby thing anymore (w/o downgrading the hard work of Hobby- > contributors, in contrast). I don't have a link at hand, but I recall > some discussion about GitHub and it's importance for hiring (since it > it acts as a CV) after MS bought it, and some of the responses are > "doing all this work in your free time is a privilege of wealthy, > mostly-white men", which without signing this statement in this really > bare form already shows a problem of open source world. > > # Contribution VS Maintenance > The very "nice" thing about patch/PR contribution is that you do your > work and then you can walk away and it's the maintainers problem to > release the artifact, upgrade/migrate your code and ensure that the > tests you've written never break. It's comfortable. Being a maintainer > means all the opposite things. And in the end, you get blamed for not > supporting certain features (see the open source paragraph here https:/ > /blog.ghost.org/5/ ) or for security disasters (remember the OpenSSL > disaster). > > I think together with the previous point this means, we have to get > companies to pay for that work, and not just dump their features to an > OSS repo. This is a huge problem. I have recently made some significant personal financial sacrifices to be able to engineer an arrangement where I can provide more scalable full-time employment opportunities for Apache Arrow maintainers. See: http://wesmckinney.com/blog/announcing-ursalabs/. Particularly in the United States, full-time employment is very important to have health care and other benefits, so the best scenario is for companies to sponsor full-time (100%, not 20%) maintainers. What I have seen happen all too often is that a person might start out spending 50-80% of their time doing OSS maintenance, and at some point they get reassigned to proprietary projects and stop doing maintenance. > > # Path to Maintainership > So I think (from my narrow point of view!) that many people expect that > the path from "outsider" to "maintainer" takes the route over "a lot of > patch/PR contributions". If I'm reading your mail right, that is not > necessarily the case for Apache projects and I think that's great. The > "review PRs" path sounds great, but I think GitHub or any platform I'm > aware don't do a good job in getting people to do so. I mean, I see a > PR and a can leave a review, but for me it is not really clear which > consequences this have (naturally, random people don't have a veto on > changes). So I can jump in when I think something is wrong, but I > cannot approve a PR. This makes sense, but it poses the question of > "how?!". I mean, it is pretty clear on how to become a patch/PR > contributor, but it is not clear on how to become a maintainer, at > least not in an easy way. (I'm sure it's written down somewhere). Since we just started a project wiki (https://cwiki.apache.org/confluence/display/ARROW), I can write down a list of all the things that I regularly do as a maintainer. Being a "maintainer" is a project leadership role; you are a "prime mover". it means you are doing all of the things that help the project stay organized, move forward, and periodically make releases. I took it upon myself to be the Arrow prime mover from the early days of the project, but we now have a large enough user and contributor base that it is unfair to me to continue bearing the load that I have in the past. > > So, overall I think a clear Call for Action at the top of the README > could help. Like "Hey, we're looking for maintainers, you could start > by reviewing some PRs and after some reviews maintainers will just be > the last gatekeeper and after some more time, you can even merge PRs on > your own". > > # My personal contribution > Triggered by this call for help, I'll try to get more involved in > Python, C++ and Rust reviews. > > So, these are some thoughts that I hope may help. > Thanks for these comments, and much appreciate your help! > Thanks again for addressing this issue and your time and passion, > Marco > > On 2018/06/30 14:57:42, Wes McKinney <w...@gmail.com> wrote: >> hi folks,> >> >> Arrow has grown by leaps and bounds over the last 2.5 years. We are> >> approaching our 2000th patch and on track to surpass 200 unique> >> contributors by year end.> >> >> All this contribution growth is great, but it has a hidden cost: > > the> >> maintenance. The burden of maintaining the project: particularly> >> reviewing and merging patches, has fallen on a very small number of> >> people. From the commit logs, we can see how many patches each> >> committer has merged:> >> >> $ git shortlog -csn > > d5aa7c46692474376a3c31704cfc4783c86338f2..master> >> 1289 Wes McKinney> >> 268 Uwe L. Korn> >> 74 Korn, Uwe> >> 54 Antoine Pitrou> >> 52 Julien Le Dem> >> 39 Philipp Moritz> >> 18 Kouhei Sutou> >> 18 Steven Phillips> >> 13 Bryan Cutler> >> 11 Jacques Nadeau> >> 10 Phillip Cloud> >> 8 Brian Hulette> >> 5 Robert Nishihara> >> 5 adeneche> >> 4 GitHub> >> 3 Sidd> >> 3 siddharth> >> 1 AbdelHakim Deneche> >> 1 Your Name Here> >> >> So Uwe and I have merged ~84% of the patches in the project so far.> >> This isn't a completely accurate reflection of the maintainer > > burden,> >> since many others contribute to code reviews and other aspects of> >> patch maintenance, and you have to be a committer to earn a place > > on> >> this list.> >> >> I'm not sure what's the best way to address this problem. The > > quality> >> of our code review has declined at times as we struggle to keep up> >> with the flow of patches -- I don't think this is good. Having the> >> patch queue pile up isn't great either. Personally, I'm having a> >> difficult time balancing project maintenance and patch authoring,> >> particularly in the last 6 months.> >> >> Unfortunately, many people believe that writing patches is the > > primary> >> mode of contribution to an open source project. Apache projects> >> explicitly state that non-patch contributions are valued in earning> >> karma (committership and PMC membership). We're starting to have > > more> >> corporate contributors come out of the woodwork, and while it's > > great> >> for contributors to be paid to write patches for the project, they > > are> >> rarely given the time and space to contribute meaningfully to> >> maintenance.> >> >> Any thoughts about how we can grow the maintainership? Somehow we > > need> >> to reach ~5-6 core maintainers over the next year.> >> >> Thanks,> >> Wes>