Re: [VOTE] Shall we adopt the "Defining Hadoop" page

Craig L Russell Wed, 15 Jun 2011 19:26:13 -0700

Hi Matthew,

I'm sorry I have to disagree.

If you change a bit in a work, it becomes a derived work. There's no"demotion" involved. Just a definition of derived work.

There's no ambiguity. Either you ship the bits that the Apache PMC hasvoted on as a release, or you change it (one bit) and it is no longerwhat the PMC has voted on. It's a derived work.

The rules for voting in Apache require that if you change a bit in anartifact, you can no longer count votes for the previous artifact.Because the new work is different. A new vote is required.


Not gray. Black and white.

Simple as that.

Craig

P.S. for the anthropologists, look at the history of Apache Derby andSun JavaDB. Meaningful, specific example.


On Jun 15, 2011, at 6:17 PM, Matthew Foley wrote:

I tend to agree with what I think you are saying, that
        * applying a small-number-of-patches that are
        * for high-severity-bug-fixes, and
        * have been Apache-Hadoop-committed
to an Apache Hadoop release should not demote the result to a"derived work".However, if so many patches are applied that the result cannot bemeaningfully
correlated with a specific Apache Hadoop release, then it probably has
become a derived work.
But how do we draw a meaningful line across that big gray area?That's why I'd like tosee specific text from one of the other projects you cited as anexample.
Thanks,
--Matt


On Jun 15, 2011, at 6:02 PM, Eli Collins wrote:
On Wed, Jun 15, 2011 at 10:44 AM, Matthew Foley <mattf@yahoo-inc.com> wrote:
Eli, you said:
Putting a build of Hadoop that has 4 security patches applied intothe same
category as a product that has entirely re-worked the code and not
gotten it checked into trunk does a major disservice to the peoplewho
contribute to and invest in the project.
How would you phrase the distinction, so that it is clear andreasonably unambiguousfor people who are not Hadoop developers? Do the HTTP andSubversion policiesdraw this distinction, and if so could you please point us at thespecific text, or
copy that text to this thread?
I'll try to find it, this was told to me verbally a while back. Maybe
Roy can chime in.

Since there seems to be some confusion around distribution we should
make this explicit.  Some people are currently interpreting the
guidelines to say that if you patch an Apache Hadoop release yourself
then you're still running Apache Hadoop.  But if a vendor patches
Apache Hadoop for you then you're not running Apache Hadoop. How about
if a subcontractor patches Apache Hadoop for you, then is it Apache
Hadoop? This isn't sustainable.

Thanks,
Eli
Thanks,
--Matt


On Jun 15, 2011, at 9:40 AM, Eli Collins wrote:
On Tue, Jun 14, 2011 at 7:45 PM, Owen O'Malley <[email protected]>wrote:
On Jun 14, 2011, at 5:48 PM, Eli Collins wrote:
Wrt derivative works, it's not clear from the document, but Ithink we
should explicitly adopt the policy of HTTPD and Subversion that
backported patches from trunk and security fixes are permitted.
Actually, the document is extremely clear that only Apachereleases may be called Hadoop.
There was a very long thread about why the rapidly expandingHadoop-ecosystem is leading to at lot of customer confusion aboutthe different "versions" of Hadoop. We as the Hadoop project don'thave the resources or the necessary compatibility test suite totest compatibility between the different sets of cherry pickedpatches. We also don't have time to ensure that all of the 1,000'sof patches applied to 0.20.2 in each of the many (10? 15?)different versions have been committed to trunk. Futhermore, underthe Apache license, a company Foo could claim that it is a cherrypick version of Hadoop without releasing their source code thatwould enable verification.
In summary,
1. Hadoop is very successful.
2. There are many different commercial products that are trying touse the Hadoop name.3. We can't check or enforce that the cherry pick versions arefollowing the rules.4. We don't have a TCK like Java does to validate new versions arecompatible.5. By far the most fair way to ensure compatibility and fairnessbetween companies is that only Apache Hadoop releases may becalled Hadoop.
That said, a package that includes a small number (< 3) ofsecurity patches that haven't been released yet doesn't seemunreasonable.
I've spoken with ops teams at many companies,  I am not aware of
anyone who runs an official release (with just 2 security patches).By
this definition many of the most valuable contributors to Hadoop,
including Yahoo!, Cloudera, Facebook, etc are not using Hadoop.  Is
that really the message we want to send? We expect the PMC to enforce
this equally across all parties?

It's a fact of life that companies and ops teams that support Hadoop
need to patch the software before the PMC has time and/or will tovoteon new releases. This is why HTTP and Subversion allow this.Putting a
build of Hadoop that has 4 security patches applied into the same
category as a product that has entirely re-worked the code and not
gotten it checked into trunk does a major disservice to the peoplewho
contribute to and invest in the project.

Thanks,
Eli


Craig L Russell
Secretary, Apache Software Foundation
Chair, OpenJPA PMC
[email protected] http://db.apache.org/jdo

Re: [VOTE] Shall we adopt the "Defining Hadoop" page

Reply via email to