What does it mean to "implement" those interfaces? I'm +1 for a TCK-based definition. In addition to statically implementing a set of interfaces, each interface also implicitly includes a set of acceptable inputs and predicted outputs (or ranges of outputs) for those inputs.
- Aaron On Wed, May 11, 2011 at 3:56 PM, Jacob R Rideout <[email protected]>wrote: > What about defining compatibility as fully implementing all the > public-stable annotated interfaces for a particular release? > > Jacob Rideout > > On Wed, May 11, 2011 at 4:42 PM, Ian Holsman <[email protected]> wrote: > > For apache (httpd I'm assuming you mean). we define compatibility as > adherence to the set of RFC's that define the HTTP protocol. > > > > I'm no expert in this (Roy is though), but we could attempt to do > something similar when it comes to HDFS/Map-Reduce protocols. I'm not sure > what benefit there would be to going to a RFC, as opposed to documenting the > API on our site. > > > > > > On May 12, 2011, at 7:24 AM, Eric Baldeschwieler wrote: > > > >> This is a really interesting topic! I completely agree that we need to > get ahead of this. > >> > >> I would be really interested in learning of any experience other apache > projects, such as apache or tomcat have with these issues. > >> > >> --- > >> E14 - typing on glass > >> > >> On May 10, 2011, at 6:31 AM, "Steve Loughran" <[email protected]> > wrote: > >> > >>> > >>> Back in Jan 2011, I started a discussion about how to define Apache > >>> Hadoop Compatibility: > >>> > http://mail-archives.apache.org/mod_mbox/hadoop-general/201101.mbox/%[email protected]%3E > >>> > >>> I am now reading EMC HD "Enterprise Ready" Apache Hadoop datasheet > >>> > >>> > http://www.greenplum.com/sites/default/files/EMC_Greenplum_HD_DS_Final_1.pdf > >>> > >>> It claims that their implementations are 100% compatible, even though > >>> the Enterprise edition uses a C filesystem. It also claims that both > >>> their software releases contain "Certified Stacks", without defining > >>> what Certified means, or who does the certification -only that it is an > >>> improvement. > >>> > >>> > >>> I think we should revisit this issue before people with their own > >>> agendas define what compatibility with Apache Hadoop is for us > >>> > >>> > >>> Licensing > >>> -Use of the Hadoop codebase must follow the Apache License > >>> http://www.apache.org/licenses/LICENSE-2.0 > >>> -plug in components that are dynamically linked to (Filesystems and > >>> schedulers) don't appear to be derivative works on my reading of this, > >>> > >>> Naming > >>> -this is something for branding@apache, they will have their opinions. > >>> The key one is that the name "Apache Hadoop" must get used, and it's > >>> important to make clear it is a derivative work. > >>> -I don't think you can claim to have a Distribution/Fork/Version of > >>> Apache Hadoop if you swap out big chunks of it for alternate > >>> filesystems, MR engines, etc. Some description of this is needed > >>> "Supports the Apache Hadoop MapReduce engine on top of Filesystem XYZ" > >>> > >>> Compatibility > >>> -the definition of the Hadoop interfaces and classes is the Apache > >>> Source tree, > >>> -the definition of semantics of the Hadoop interfaces and classes is > >>> the Apache Source tree, including the test classes. > >>> -the verification that the actual semantics of an Apache Hadoop > >>> release is compatible with the expected semantics is that current and > >>> future tests pass > >>> -bug reports can highlight incompatibility with expectations of > >>> community users, and once incorporated into tests form part of the > >>> compatibility testing > >>> -vendors can claim and even certify their derivative works as > >>> compatible with other versions of their derivative works, but cannot > >>> claim compatibility with Apache Hadoop unless their code passes the > >>> tests and is consistent with the bug reports marked as ("by design"). > >>> Perhaps we should have tests that verify each of these "by design" > >>> bugreps to make them more formal. > >>> > >>> Certification > >>> -I have no idea what this means in EMC's case, they just say > "Certified" > >>> -As we don't do any certification ourselves, it would seem impossible > >>> for us to certify that any derivative work is compatible. > >>> -It may be best to state that nobody can certify their derivative as > >>> "compatible with Apache Hadoop" unless it passes all current test > suites > >>> -And require that anyone who declares compatibility define what they > >>> mean by this > >>> > >>> This is a good argument for getting more functional tests out there > >>> -whoever has more functional tests needs to get them into a test module > >>> that can be used to test real deployments. > >>> > > > > >
