Cos, Can you give me an example of a "system test" that is not a functional test ? My assumption was that the functionality being tested is specific to a component, and that inter-component interactions (that's what you meant, right?) would be taken care by the public interface and semantics of a component API.
- milind -- Milind Bhandarkar [email protected] +1-650-776-3167 On 5/12/11 3:30 PM, "Konstantin Boudnik" <[email protected]> wrote: >On Thu, May 12, 2011 at 09:45, Milind Bhandarkar ><[email protected]> wrote: >> HCK and written specifications are not mutually exclusive. However, >>given >> the evolving nature of Hadoop APIs, functional tests need to evolve as > >I would actually expand it to 'functional and system tests' because >latter are capable of validating inter-component iterations not >coverable by functional tests. > >Cos > >> well, and having them tied to a "current stable" version is easier to do >> than it is to tie the written specifications. >> >> - milind >> >> -- >> Milind Bhandarkar >> [email protected] >> +1-650-776-3167 >> >> >> >> >> >> >> On 5/11/11 7:26 PM, "M. C. Srivas" <[email protected]> wrote: >> >>>While the HCK is a great idea to check quickly if an implementation is >>>"compliant", we still need a written specification to define what is >>>meant >>>by compliance, something akin to a set of RFC's, or a set of docs like >>>the >>> IEEE POSIX specifications. >>> >>>For example, the POSIX.1c pthreads API has a written document that >>>specifies >>>all the function calls, input params, return values, and error codes. It >>>clearly indicates what any POSIX-complaint threads package needs to >>>support, >>>and what are vendor-specific non-portable extensions that one can use at >>>one's own risk. >>> >>>Currently we have 2 sets of API in the DFS and Map/Reduce layers, and >>>the >>>specification is extracted only by looking at the code, or (where the >>>code >>>is non-trivial) by writing really bizarre test programs to examine >>>corner >>>cases. Further, the interaction between a mix of the old and new APIs is >>>not >>>specified anywhere. Such specifications are vitally important when >>>implementing libraries like Cascading, Mahout, etc. For example, an >>>application might open a file using the new API, and pass that stream >>>into a >>>library that manipulates the stream using some of the old API ... what >>>is >>>then the expectation of the state of the stream when the library call >>>returns? >>> >>>Sanjay Radia @ Y! already started specifying some the DFS APIs to nail >>>such >>>things down. There's similar good effort in the Map/Reduce and Avro >>>spaces, >>>but it seems to have stalled somewhat. We should continue it. >>> >>>Doing such specs would be a great service to the community and the users >>>of >>>Hadoop. It provides them >>> (a) clear-cut docs on how to use the Hadoop APIs >>> (b) wider choice of Hadoop implementations by freeing them from >>>vendor >>>lock-in. >>> >>>Once we have such specification, the HCK becomes meaningful (since the >>>HCK >>>itself will be buggy initially). >>> >>> >>>On Wed, May 11, 2011 at 2:46 PM, Milind Bhandarkar >>><[email protected] >>>> wrote: >>> >>>> I think it's time to separate out functional tests as a "Hadoop >>>> Compatibility Kit (HCK)", similar to the Sun TCK for Java, but under >>>>ASL >>>> 2.0. Then "certification" would mean "Passes 100% of the HCK >>>>testsuite." >>>> >>>> - milind >>>> -- >>>> Milind Bhandarkar >>>> [email protected] >>>> >>>> >>>> >>>> >>>> >>>> >>>> On 5/11/11 2:24 PM, "Eric Baldeschwieler" <[email protected]> >>>>wrote: >>>> >>>> >This is a really interesting topic! I completely agree that we need >>>>to >>>> >get ahead of this. >>>> > >>>> >I would be really interested in learning of any experience other >>>>apache >>>> >projects, such as apache or tomcat have with these issues. >>>> > >>>> >--- >>>> >E14 - typing on glass >>>> > >>>> >On May 10, 2011, at 6:31 AM, "Steve Loughran" <[email protected]> >>>>wrote: >>>> > >>>> >> >>>> >> Back in Jan 2011, I started a discussion about how to define Apache >>>> >> Hadoop Compatibility: >>>> >> >>>> >> >>>> >>>>http://mail-archives.apache.org/mod_mbox/hadoop-general/201101.mbox/%3C >>>>4D >>>> >>[email protected]%3E >>>> >> >>>> >> I am now reading EMC HD "Enterprise Ready" Apache Hadoop datasheet >>>> >> >>>> >> >>>> >>>>>>http://www.greenplum.com/sites/default/files/EMC_Greenplum_HD_DS_Fina >>>>>>l_ >>>>>>1 >>>> . >>>> >>pdf >>>> >> >>>> >> It claims that their implementations are 100% compatible, even >>>>though >>>> >> the Enterprise edition uses a C filesystem. It also claims that >>>>both >>>> >> their software releases contain "Certified Stacks", without >>>>defining >>>> >> what Certified means, or who does the certification -only that it >>>>is >>>>an >>>> >> improvement. >>>> >> >>>> >> >>>> >> I think we should revisit this issue before people with their own >>>> >> agendas define what compatibility with Apache Hadoop is for us >>>> >> >>>> >> >>>> >> Licensing >>>> >> -Use of the Hadoop codebase must follow the Apache License >>>> >> http://www.apache.org/licenses/LICENSE-2.0 >>>> >> -plug in components that are dynamically linked to (Filesystems and >>>> >> schedulers) don't appear to be derivative works on my reading of >>>>this, >>>> >> >>>> >> Naming >>>> >> -this is something for branding@apache, they will have their >>>>opinions. >>>> >> The key one is that the name "Apache Hadoop" must get used, and >>>>it's >>>> >> important to make clear it is a derivative work. >>>> >> -I don't think you can claim to have a Distribution/Fork/Version >>>>of >>>> >> Apache Hadoop if you swap out big chunks of it for alternate >>>> >> filesystems, MR engines, etc. Some description of this is needed >>>> >> "Supports the Apache Hadoop MapReduce engine on top of Filesystem >>>>XYZ" >>>> >> >>>> >> Compatibility >>>> >> -the definition of the Hadoop interfaces and classes is the Apache >>>> >> Source tree, >>>> >> -the definition of semantics of the Hadoop interfaces and classes >>>>is >>>> >> the Apache Source tree, including the test classes. >>>> >> -the verification that the actual semantics of an Apache Hadoop >>>> >> release is compatible with the expected semantics is that current >>>>and >>>> >> future tests pass >>>> >> -bug reports can highlight incompatibility with expectations of >>>> >> community users, and once incorporated into tests form part of the >>>> >> compatibility testing >>>> >> -vendors can claim and even certify their derivative works as >>>> >> compatible with other versions of their derivative works, but >>>>cannot >>>> >> claim compatibility with Apache Hadoop unless their code passes the >>>> >> tests and is consistent with the bug reports marked as ("by >>>>design"). >>>> >> Perhaps we should have tests that verify each of these "by design" >>>> >> bugreps to make them more formal. >>>> >> >>>> >> Certification >>>> >> -I have no idea what this means in EMC's case, they just say >>>> >>"Certified" >>>> >> -As we don't do any certification ourselves, it would seem >>>>impossible >>>> >> for us to certify that any derivative work is compatible. >>>> >> -It may be best to state that nobody can certify their derivative >>>>as >>>> >> "compatible with Apache Hadoop" unless it passes all current test >>>>suites >>>> >> -And require that anyone who declares compatibility define what >>>>they >>>> >> mean by this >>>> >> >>>> >> This is a good argument for getting more functional tests out there >>>> >> -whoever has more functional tests needs to get them into a test >>>>module >>>> >> that can be used to test real deployments. >>>> >> >>>> >>>> >> >>
