FWIW the FileSystemContractBaseTest class and the FileContext*BaseTest classes (and their concrete subclasses) are probably the closest thing we have to compatibility tests for FileSystem and FileContext implementations in Hadoop.
Tom On Mon, Jan 31, 2011 at 7:59 AM, Steve Loughran <[email protected]> wrote: > On 31/01/11 14:32, Chris Douglas wrote: >> >> Steve- >> >> It's hard to answer without more concrete criteria. Is this a >> trademark question affecting the marketing of a product? A >> cross-compatibility taxonomy for users? The minimum criteria to >> publish a paper/release a product without eye-rolling? The particular >> compatibility claims made by a system will be nuanced and specific; a >> runtime that executes MapReduce jobs as they would run in Hadoop can >> simply make that claim, whether it uses parts of MapReduce, HDFS, or >> neither. > > No, I'm thinking more about what large scale tests are needed to be run > against the codebase before you can say "it works", and then how to say some > changes means that it still works. > >> >> For the various distributions "Powered by Apache Hadoop," one would >> assume that compatibility will vary depending on the featureset and >> the audience. A distribution that runs MapReduce applications >> as-written for Apache Hadoop may be incompatible with a user's >> deployed metrics/monitoring system. Some random script to scrape the >> UI may not work. The product may only scale to 20 nodes. Whether these >> are "compatible with Apache Hadoop" is awkward to answer generally, >> unless we want to define the semantics of that phrase by policy. >> >> To put it bluntly, why would we bother to define such a policy? One >> could assert that a fully-compatible system would implement all the >> public/stable APIs as defined in HADOOP-5073, but who would that help? >> And though interoperability is certainly relevant to systems built on >> top of Hadoop, is there a reason the Apache project needs to be >> involved in defining the standards for compatibility among them? > > Agreed, I'm just thinking about namings and definitions. Even with the > stable/unstable internal/external split, there's still the question as to > what the semantics of operations are, both explicit (this operation does X) > and implicit (and it takes less than Y seconds to do it). It's those > implicit things that always catch you out (indeed, they are the argument > points in things like Java and Java EE compatibility test kits) >
