I think it's time to separate out functional tests as a "Hadoop Compatibility Kit (HCK)", similar to the Sun TCK for Java, but under ASL 2.0. Then "certification" would mean "Passes 100% of the HCK testsuite."
- milind -- Milind Bhandarkar [email protected] On 5/11/11 2:24 PM, "Eric Baldeschwieler" <[email protected]> wrote: >This is a really interesting topic! I completely agree that we need to >get ahead of this. > >I would be really interested in learning of any experience other apache >projects, such as apache or tomcat have with these issues. > >--- >E14 - typing on glass > >On May 10, 2011, at 6:31 AM, "Steve Loughran" <[email protected]> wrote: > >> >> Back in Jan 2011, I started a discussion about how to define Apache >> Hadoop Compatibility: >> >>http://mail-archives.apache.org/mod_mbox/hadoop-general/201101.mbox/%3C4D >>[email protected]%3E >> >> I am now reading EMC HD "Enterprise Ready" Apache Hadoop datasheet >> >> >>http://www.greenplum.com/sites/default/files/EMC_Greenplum_HD_DS_Final_1. >>pdf >> >> It claims that their implementations are 100% compatible, even though >> the Enterprise edition uses a C filesystem. It also claims that both >> their software releases contain "Certified Stacks", without defining >> what Certified means, or who does the certification -only that it is an >> improvement. >> >> >> I think we should revisit this issue before people with their own >> agendas define what compatibility with Apache Hadoop is for us >> >> >> Licensing >> -Use of the Hadoop codebase must follow the Apache License >> http://www.apache.org/licenses/LICENSE-2.0 >> -plug in components that are dynamically linked to (Filesystems and >> schedulers) don't appear to be derivative works on my reading of this, >> >> Naming >> -this is something for branding@apache, they will have their opinions. >> The key one is that the name "Apache Hadoop" must get used, and it's >> important to make clear it is a derivative work. >> -I don't think you can claim to have a Distribution/Fork/Version of >> Apache Hadoop if you swap out big chunks of it for alternate >> filesystems, MR engines, etc. Some description of this is needed >> "Supports the Apache Hadoop MapReduce engine on top of Filesystem XYZ" >> >> Compatibility >> -the definition of the Hadoop interfaces and classes is the Apache >> Source tree, >> -the definition of semantics of the Hadoop interfaces and classes is >> the Apache Source tree, including the test classes. >> -the verification that the actual semantics of an Apache Hadoop >> release is compatible with the expected semantics is that current and >> future tests pass >> -bug reports can highlight incompatibility with expectations of >> community users, and once incorporated into tests form part of the >> compatibility testing >> -vendors can claim and even certify their derivative works as >> compatible with other versions of their derivative works, but cannot >> claim compatibility with Apache Hadoop unless their code passes the >> tests and is consistent with the bug reports marked as ("by design"). >> Perhaps we should have tests that verify each of these "by design" >> bugreps to make them more formal. >> >> Certification >> -I have no idea what this means in EMC's case, they just say >>"Certified" >> -As we don't do any certification ourselves, it would seem impossible >> for us to certify that any derivative work is compatible. >> -It may be best to state that nobody can certify their derivative as >> "compatible with Apache Hadoop" unless it passes all current test suites >> -And require that anyone who declares compatibility define what they >> mean by this >> >> This is a good argument for getting more functional tests out there >> -whoever has more functional tests needs to get them into a test module >> that can be used to test real deployments. >>
