This is a really interesting topic! I completely agree that we need to get ahead of this.
I would be really interested in learning of any experience other apache projects, such as apache or tomcat have with these issues. --- E14 - typing on glass On May 10, 2011, at 6:31 AM, "Steve Loughran" <[email protected]> wrote: > > Back in Jan 2011, I started a discussion about how to define Apache > Hadoop Compatibility: > http://mail-archives.apache.org/mod_mbox/hadoop-general/201101.mbox/%[email protected]%3E > > I am now reading EMC HD "Enterprise Ready" Apache Hadoop datasheet > > http://www.greenplum.com/sites/default/files/EMC_Greenplum_HD_DS_Final_1.pdf > > It claims that their implementations are 100% compatible, even though > the Enterprise edition uses a C filesystem. It also claims that both > their software releases contain "Certified Stacks", without defining > what Certified means, or who does the certification -only that it is an > improvement. > > > I think we should revisit this issue before people with their own > agendas define what compatibility with Apache Hadoop is for us > > > Licensing > -Use of the Hadoop codebase must follow the Apache License > http://www.apache.org/licenses/LICENSE-2.0 > -plug in components that are dynamically linked to (Filesystems and > schedulers) don't appear to be derivative works on my reading of this, > > Naming > -this is something for branding@apache, they will have their opinions. > The key one is that the name "Apache Hadoop" must get used, and it's > important to make clear it is a derivative work. > -I don't think you can claim to have a Distribution/Fork/Version of > Apache Hadoop if you swap out big chunks of it for alternate > filesystems, MR engines, etc. Some description of this is needed > "Supports the Apache Hadoop MapReduce engine on top of Filesystem XYZ" > > Compatibility > -the definition of the Hadoop interfaces and classes is the Apache > Source tree, > -the definition of semantics of the Hadoop interfaces and classes is > the Apache Source tree, including the test classes. > -the verification that the actual semantics of an Apache Hadoop > release is compatible with the expected semantics is that current and > future tests pass > -bug reports can highlight incompatibility with expectations of > community users, and once incorporated into tests form part of the > compatibility testing > -vendors can claim and even certify their derivative works as > compatible with other versions of their derivative works, but cannot > claim compatibility with Apache Hadoop unless their code passes the > tests and is consistent with the bug reports marked as ("by design"). > Perhaps we should have tests that verify each of these "by design" > bugreps to make them more formal. > > Certification > -I have no idea what this means in EMC's case, they just say "Certified" > -As we don't do any certification ourselves, it would seem impossible > for us to certify that any derivative work is compatible. > -It may be best to state that nobody can certify their derivative as > "compatible with Apache Hadoop" unless it passes all current test suites > -And require that anyone who declares compatibility define what they > mean by this > > This is a good argument for getting more functional tests out there > -whoever has more functional tests needs to get them into a test module > that can be used to test real deployments. >
