Re: Fwd: [VOTE] Shall we adopt the "Defining Hadoop" page

Andrew Purtell Mon, 20 Jun 2011 09:39:58 -0700

Hi Jeff,

First, apologies for removing most of your argument for clarity. Readers can 
find it in the general@ archives I am sure.


> Lastly, I'd love to learn more about how other prominent open source
> projects have approached this issue. If you have any knowledge about
> how Linux handled the use of its trademark, please add your
> thoughts to
> http://www.quora.com/What-are-the-rules-for-using-the-Linux-trademark-in-a-product-name.
> Because Apache Hadoop is a kernel technology, similar to Linux, I
> suspect there are many useful lessons to learn. Or at least crazy
> email threads to read.

I would argue the concern about trademark has an additional dimension here, and 
perhaps a fairly core additional motivation to protect, because these are open 
source projects. The mention of Linux helps to illustrate it.

The obvious difference between Hadoop and Linux is Linux has a universally 
recognized clear hierarchy with a single -- and exceptional, and quickly and 
forcefully opinionated -- authority at the top. For Linux, the power to define 
Linux rests obviously with Linus. Regarding Hadoop, the power to do anything, 
including define what is Hadoop, is diffuse.

For would-be open source participants who want to contribute to the Linux 
kernel, the canonical source of the Linux kernel is clearly Linus' tree and you 
want your contribution to end up there. He is the authority. Linux will always 
be defined by Linus until he is gone. (That is a long term problem for Linux of 
course.) It is a benevolent dictatorship that perhaps uniquely works, allowing 
enough contributors to see the fruits of their labor to sustain it while 
simultaneously maintaining a strong identity. 

Hadoop has no equivalent.

Linux, for now at least, can be quite liberal in how the Linux mark is used 
because of how its identity as a project is defined, therefore its ability to 
attract contributions.

Hadoop I think needs to be more careful. What triggered this discussion is the 
arrival of new players releasing products they call Hadoop but containing 
severe changes the community, by way of the ASF umbrella we all work under, had 
nothing to do with designing or developing. And some of these are being open 
sourced as a Hadoop. There is no Linus here. Which of these is _the_ Hadoop? As 
a would-be contributor, which should I select?

Already we have some issues. In some cases I'd rather contribute to Cloudera 
sources because at least I know my contribution to CDH will see a timely 
release.

Furthermore, I believe the extent to which users see value in ASF Hadoop, and 
have a clear definition of what ASF Hadoop is, will be correlated with the 
extent to which the ASF can attract enough contributions to Hadoop to sustain 
innovation against competing technologies.

The open source value proposition "I contribute to Hadoop" impacts the long 
term survival of the project. Individuals and organizations are both motivated 
by this, for various reasons.

   - Andy

Re: Fwd: [VOTE] Shall we adopt the "Defining Hadoop" page

Reply via email to