+1. Apache foundation or contributors to Apache should not waste their energy providing such certification.
Compatibility claims should be easily verifiable by users of these proprietary systems or independent observers, if a test-suite were readily available to run. >The Hadoop mark should only be used to refer to open-source software >produced by the ASF. IANAL, but Steve is questioning usage of "Apache Hadoop Compatible" in PR material of commercial software. Is this considered as usage of "The Hadoop mark" ? - milind -- Milind Bhandarkar [email protected] +1-650-776-3167 On 5/12/11 11:16 PM, "Doug Cutting" <[email protected]> wrote: >Certification semms like mission creep. Our mission is to produce >open-source software. If we wish to produce testing software, that >seems fine. But running a certification program for non-open-source >software seems like a different task. > >The Hadoop mark should only be used to refer to open-source software >produced by the ASF. If other folks wish to make factual statements >concerning our software, e.g., that their proprietary software passes >tests that we've created, that may be fine, but I don't think we should >validate those claims by granting certifications to institutions. That >ventures outside the mission of the ASF. We are not an accrediting >organization. > >Doug > >On 05/10/2011 12:29 PM, Steve Loughran wrote: >> >> Back in Jan 2011, I started a discussion about how to define Apache >> Hadoop Compatibility: >> >>http://mail-archives.apache.org/mod_mbox/hadoop-general/201101.mbox/%3C4D >>[email protected]%3E >> >> >> I am now reading EMC HD "Enterprise Ready" Apache Hadoop datasheet >> >> >>http://www.greenplum.com/sites/default/files/EMC_Greenplum_HD_DS_Final_1. >>pdf >> >> >> It claims that their implementations are 100% compatible, even though >> the Enterprise edition uses a C filesystem. It also claims that both >> their software releases contain "Certified Stacks", without defining >> what Certified means, or who does the certification -only that it is an >> improvement. >> >> >> I think we should revisit this issue before people with their own >> agendas define what compatibility with Apache Hadoop is for us >> >> >> Licensing >> -Use of the Hadoop codebase must follow the Apache License >> http://www.apache.org/licenses/LICENSE-2.0 >> -plug in components that are dynamically linked to (Filesystems and >> schedulers) don't appear to be derivative works on my reading of this, >> >> Naming >> -this is something for branding@apache, they will have their opinions. >> The key one is that the name "Apache Hadoop" must get used, and it's >> important to make clear it is a derivative work. >> -I don't think you can claim to have a Distribution/Fork/Version of >> Apache Hadoop if you swap out big chunks of it for alternate >> filesystems, MR engines, etc. Some description of this is needed >> "Supports the Apache Hadoop MapReduce engine on top of Filesystem XYZ" >> >> Compatibility >> -the definition of the Hadoop interfaces and classes is the Apache >> Source tree, >> -the definition of semantics of the Hadoop interfaces and classes is >> the Apache Source tree, including the test classes. >> -the verification that the actual semantics of an Apache Hadoop release >> is compatible with the expected semantics is that current and future >> tests pass >> -bug reports can highlight incompatibility with expectations of >> community users, and once incorporated into tests form part of the >> compatibility testing >> -vendors can claim and even certify their derivative works as >> compatible with other versions of their derivative works, but cannot >> claim compatibility with Apache Hadoop unless their code passes the >> tests and is consistent with the bug reports marked as ("by design"). >> Perhaps we should have tests that verify each of these "by design" >> bugreps to make them more formal. >> >> Certification >> -I have no idea what this means in EMC's case, they just say >>"Certified" >> -As we don't do any certification ourselves, it would seem impossible >> for us to certify that any derivative work is compatible. >> -It may be best to state that nobody can certify their derivative as >> "compatible with Apache Hadoop" unless it passes all current test suites >> -And require that anyone who declares compatibility define what they >> mean by this >> >> This is a good argument for getting more functional tests out there >> -whoever has more functional tests needs to get them into a test module >> that can be used to test real deployments. >>
