Hey John,

I'm not sure if Jim (proposed PMC chair) has surveyed all the PMC members
recently but I'm in touch with a good number of the PMC so I think I have a
good sense, and I'll take a whack at answering this:

It looks like of the proposed 24 PMC members, 15 work at Cloudera, the
original organization which developed Impala prior to its contribution to
the incubator. The other 9 are at a mix of employers, but as best I know,
are not currently "sponsored" by their employers to contribute to the
Impala project.

As is natural with any project, it will be easier for people to be active
contributors and committers if someone is employing them to do so, and the
PMC membership reflects that. However, I've seen that the Impala community
has also made great efforts in a few areas:

- Jim has been making a bunch of "tutorial" style posts[1] to the dev
mailing list with instructions on how to navigate the Impala code base,
write tests, and fix example bugs for new contributors. I believe he has
also added these to various repositories like the "help wanted" page at the
ASF as well as "yourfirstpr" and "up-for-grabs" on github. Unfortunately it
seems like

- Similarly, the Impala community maintains a long list of "newbie" issues
which a new contributor can use to ramp up with[2]. In other communities
I've seen these "newbie" JIRAs are a nice way for people to take on a small
patch before getting started with something large, especially in large code
bases.

- Questions on the dev list from new contributors are always answered
promptly and with good amounts of encouragement, eg [3] and [4] from
earlier this week.

- In addition to trying to recruit new developers through the ways
described above, it's clear that design discussion, project decisions, and
code review are all being done in the open based on Apache principles. A
quick glance at recent dev@ archives shows various discussions about
project scope, implementation choices, etc. They've also been consistently
adding new committers and PMC members through incubation.

- Lastly, I'll note that in the last year the Impala community has started
to build more close ties with other ASF communities. For example, Impala
and Apache Kudu are now sharing common code for RPC, and representatives
from the Impala community are now contributing regularly on Apache Parquet
(two were recently voted committers). These cross-project collaborations,
are, IMO, one of the things that make the ASF more than a "Switzerland for
intellectual property" as some have disparagingly described it, and it's
great to see Impala taking part in that.

That said, I don't want to paint a completely rosy picture: even with the
above efforts, it seems the majority of day-to-day contributions are still
coming from contributors affiliated with a single entity. The Impala
community has recognized that, and, based on the above efforts, I imagine
they plan to continue to try to expand the community even after graduation.

As for whether this should block graduation, I'll quote Roy here from a
2012 discussion:

>
> There is no diversity requirement at the ASF.  There is a behavior
> requirement for graduation and a behavior requirement for TLPs.
> We must not confuse the two.  If the Incubator says that there is a
> diversity requirement for graduation, ignore it (or at least figure
> out what the docs were supposed to say and then do that).


... and other proponents of the same philosophy can be found from other
graduation proposals.

It's clear to me that Impala is *behaving* like a TLP, and it's *despite*
their best efforts that the diversity hasn't improved as much as one might
hope. The unfortunate fact of life here is that the number of engineers out
there who are interested in contributing code to query engines is
relatively low (and most of those few are kept very busy by their
employers), so it's not terribly surprising that we haven't seen hordes
come out of the woodwork.

As for risk of abandonment, I believe that it's quite low. Cloudera has
historically contributed to many other ASF projects and with rare
exceptions the level of contribution has grown over time rather than
diminished. This is certainly the case with Impala as well, if you compare
the number of active contributors over time. Some of Cloudera's core
products are powered by Impala, and running at large numbers of enterprise
customers with multi-year support contracts, so it would be a pretty long
shot to imagine it being abandoned.

(lest anyone shout "bias!", I'll disclose that Cloudera also pays my
salary, but on a different internal team)

-Todd

[1] https://lists.apache.org/list.html?dev@impala.apache.
org:lte=1y:%22New%20Impala%20Contributors%22
[2] https://issues.apache.org/jira/browse/IMPALA-6096?filter=12341668
[3] https://lists.apache.org/thread.html/02e19a37be25f3db07b874a7602b3c
4ac66d2e1499b66bda53b561f6@%3Cdev.impala.apache.org%3E
[4] https://lists.apache.org/thread.html/4e77e2f13a9a69fa7c55c413bfc529
a439c63aa60458649d5ece072d@%3Cdev.impala.apache.org%3E


On Mon, Nov 6, 2017 at 2:00 PM, John D. Ament <johndam...@apache.org> wrote:

> The only question I have is the typical distribution of proposed PMC
> members to companies.
>
> John
>
>

Reply via email to