+1 to what Eli says. If nobody is running official Hadoop according to this definition, but everybody thinks that they are running hadoop, then this definition is a bit out of whack. The source of the dissonance is related to the fact that release just don't happen often enough in Hadoop.
In addition, I think that the limitations on usage are too strict. For instance, if "QuickBooks for Windows" [1] doesn't cause Microsoft to sue Intuit, then "Joe's Foo for Apache Hadoop" really shouldn't cause any more grief. So I would give a (non-binding) -1 to the policy as stated. [1] http://quickbooks.intuit.com/product/accounting_software/windows_financial_management_software.jsp On Wed, Jun 15, 2011 at 6:40 PM, Eli Collins <[email protected]> wrote: > On Tue, Jun 14, 2011 at 7:45 PM, Owen O'Malley <[email protected]> wrote: > > > > On Jun 14, 2011, at 5:48 PM, Eli Collins wrote: > > > >> Wrt derivative works, it's not clear from the document, but I think we > >> should explicitly adopt the policy of HTTPD and Subversion that > >> backported patches from trunk and security fixes are permitted. > > > > Actually, the document is extremely clear that only Apache releases may > be called Hadoop. > > > > There was a very long thread about why the rapidly expanding > Hadoop-ecosystem is leading to at lot of customer confusion about the > different "versions" of Hadoop. We as the Hadoop project don't have the > resources or the necessary compatibility test suite to test compatibility > between the different sets of cherry picked patches. We also don't have time > to ensure that all of the 1,000's of patches applied to 0.20.2 in each of > the many (10? 15?) different versions have been committed to trunk. > Futhermore, under the Apache license, a company Foo could claim that it is a > cherry pick version of Hadoop without releasing their source code that would > enable verification. > > > > In summary, > > 1. Hadoop is very successful. > > 2. There are many different commercial products that are trying to use > the Hadoop name. > > 3. We can't check or enforce that the cherry pick versions are following > the rules. > > 4. We don't have a TCK like Java does to validate new versions are > compatible. > > 5. By far the most fair way to ensure compatibility and fairness between > companies is that only Apache Hadoop releases may be called Hadoop. > > > > That said, a package that includes a small number (< 3) of security > patches that haven't been released yet doesn't seem unreasonable. > > > > I've spoken with ops teams at many companies, I am not aware of > anyone who runs an official release (with just 2 security patches). By > this definition many of the most valuable contributors to Hadoop, > including Yahoo!, Cloudera, Facebook, etc are not using Hadoop. Is > that really the message we want to send? We expect the PMC to enforce > this equally across all parties? > > It's a fact of life that companies and ops teams that support Hadoop > need to patch the software before the PMC has time and/or will to vote > on new releases. This is why HTTP and Subversion allow this. Putting a > build of Hadoop that has 4 security patches applied into the same > category as a product that has entirely re-worked the code and not > gotten it checked into trunk does a major disservice to the people who > contribute to and invest in the project. > > Thanks, > Eli >
