I have a concern. HDFS, mapreduce, Hbase, Hive, and Pig taken
together form a coherent software stack. I suspect that many users
see this as a whole, in the same way they see a Linux distribution as
a whole, without remembering that Linux is really the kernel while
other GNU components are added into the distribution. HDFS and
mapreduce form a base on which the other projects depend. Hbase,
Hive, and Pig function to extend the base to many more users, both in
terms of making it easier to use and bringing new functionality.
Thus splitting them up does not make sense to me. They form a whole,
why not keep them as a whole?
I know that the response will be that becoming a top level project
doesn't mean they cannot continue to function as a whole; that this is
merely a governance issue and not an issue of how projects work
together (see for example http://bit.ly/9ylAYS). But I remain
skeptical. The structure of governing hierarchies always influence
cohesion of a group. It would help me if advocates of this position
could point to successful, separate Apache TLPs that are either
completely dependent on another project (as HBase would be on Hadoop)
or significantly dependent to extend functionality (as Hadoop would be
on Hbase).
This is not to say that I do not understand the value of having PMCs
from these growing subprojects report directly to Apache. But I am
concerned that we are letting this valid concern overrule other
equally valid concerns without considering the tradeoffs.
Alan.
On Apr 7, 2010, at 9:02 PM, Stack wrote:
The HBase subproject has voted to become a TLP: http://su.pr/1g0HAN
Does the Hadoop community have any questions or concerns about this
proposal?
Please don't vote yet in response to this. I'll call a formal vote
after questions, if any, have been resolved.
Thanks,
St.Ack
(Thanks for the boiler plate Doug Cutting)