Re: Defining Hadoop Compatibility -revisiting-

Matthew Foley Mon, 16 May 2011 14:18:57 -0700

It's important to distinguish between the name "Hadoop", which is protected by 
trademark law,
and the Hadoop implementation, which is licensed as opensource under copyright 
law.


The term "derivative work" is, I believe, only relevant under copyright law, 
not trademark law.
(N.B., I'm not a lawyer -- and this email is my opinion, not my employer's.)  
Since the Apache License
explicitly allows derivative works, I don't think it's a useful term for this 
discussion.

However, the ASF, and by delegation the Hadoop PMC, has a lot of control over 
the name,
and how we allow it to be used, under trademark law.  But to keep our rights 
under that
law, we have to enforce the trademark consistently.  So it's good that we're 
having this discussion,
and it's important to reach a conclusion, document it, and enforce it 
consistently.

There are a lot of subtleties; for instance, if I recall correctly from my days 
with Adobe and
PostScript(R), someone who has not licensed a trademark "X" can still claim 
"compatible with X"
as long as they ALSO make clear that the product is NOT, itself, an "X".  But 
you really need
a lawyer to get into that stuff.

--Matt


On May 16, 2011, at 5:00 AM, Segel, Mike wrote:

But Cloudera's release is a bit murky.

The math example is a bit flawed...

X represents the set of stable releases.
Y represents the set of available patches.
C represents the set of Cloudera releases.

So if C contains a release X(n) plus a set of patches that is contained in Y,
Then does it not have the right to be considered Apache Hadoop?
It's my understanding is that any enhancement to Hadoop is made available to 
Apache and will eventually make it into a later release...

So while it may not be 'official' release X(z), all of it's components are in 
Apache.
(note: I'm talking about the core components and not Cloudera's additional 
toolsets that encompass Hadoop.)

Cloudera is clearly a derivative work.
And IMHO is the only one which can say ... 'Includes Apache Hadoop'.

That doesn't mean that others can't, depending on how they implemented their 
changes.
Based on EMC marketing material, they've done a rip and replace of HDFS.
So it wouldn't be a superset since it doesn't contain a complete subset, but 
contains code that implements the API... So they can't say 'Includes Apache 
Hadoop',but they can say it's a derivative work based on Apache Hadoop and then 
go on to show how and why, in their opinion their product is better.(that's 
marketing for you...)

Clearly there are others out there...
Hadoop on Cassandra as an example...

Fragmentation of Hadoop will occur. It's inevitable. Too much money is on the 
table...

But because Apache's licensing is so open, Apache will have a hard time 
controlling derivative works...
I believe that Steve is incorrect in his assertion concerning potential loss of 
any patent protection. Again Apache's licensing is very open and as long as 
they follow Apache's Ts and Cs, they are covered.

Note: because I am sending this from my email address at my client, I am 
obliged to say that this email is my opinion and does not reflect on the 
opinion of my client...
(you know the rest....)

Sent from a remote device. Please excuse any typos...

Mike Segel

On May 16, 2011, at 6:02 AM, "Steve Loughran" 
<[email protected]<mailto:[email protected]>> wrote:

On 13/05/11 23:57, Allen Wittenauer wrote:

On May 13, 2011, at 3:53 PM, Ted Dunning wrote:

But "distribution Z includes X" kind of implies the existence of some such
that X != Y, Y != empty-set and X+Y = Z, at least in common usage.

Isn't that the same as a non-trunk change?

So doesn't this mean that your question reduces to the question of what
happens when non-Apache changes are made to an Apache release?  And isn't
that the definition of a derived work?


  Yup. Which is why I doubt *any* commercial entity can claim "includes Apache 
Hadoop" (including Cloudera).



but they can claim it is a derivative work, which CDH clearly is,
(Though if we were to come up with a formal declaration of what a
derivative work is, we'd have to handle the fact that it is a superset.
Even worse, you may realise a release is the ordered application of a
sequence of patches, and if the patches are applied in a different order
you may end up with a different body of source code...)

Something that implements the APIs may not be a derivative work,
depending on how much of the original code is in there. You could look
at the base classes and interfaces and produce a clean room
implementation (relying on the notion that interfaces are a list of
facts and not copyrightable in the US), but whoever does that may
encounter the issue that Google's donation of the right to use their MR
patent may not apply to such implementations.


The information contained in this communication may be CONFIDENTIAL and is 
intended only for the use of the recipient(s) named above.  If you are not the 
intended recipient, you are hereby notified that any dissemination, 
distribution, or copying of this communication, or any of its contents, is 
strictly prohibited.  If you have received this communication in error, please 
notify the sender and delete/destroy the original message and any copy of it 
from your computer or paper files.

Re: Defining Hadoop Compatibility -revisiting-

Reply via email to