Great summary Andrew. I would add one more precipitating factor here. That is the arrival of a number of products which are very close to the Apache version of Hadoop but for which there is no good and widely accepted terminology that gives proper credit to their lineage while making clear the distinction from bit-for-bit copies of official Apache releases.
Some products are analogous to hive, pig or hbase in that they are independent systems that run ON hadoop (or close equivalents). These have no terminology problem because these products aren't hadoop, but rather use hadoop. Other products contain Hadoop internally as a critical component but do not necessarily expose Hadoop capabilities to the end user (I can't name these products, but they exist). These products have little nomenclatural difficulty because the powerd-by-Hadoop description fits very well. The products with the terminology problem are the ones that are add either curation and packaging (Cloudera) or substantial additional performance enhancing components (MapR). These products are upwardly compatible with Apache Hadoop in that programs that run on Hadoop will very probably run on these Hadoop-like systems. The problem is that there is no good term for these products. They may even contain components that are bit-for-bit identical to the same components for Apache releases. It is fair to say that these are not Apache released software, but it is also fair to say that there ought to be a better name for the class of these products. On Mon, Jun 20, 2011 at 4:39 PM, Andrew Purtell <[email protected]> wrote: > Hadoop I think needs to be more careful. What triggered this discussion is > the arrival of new players releasing products they call Hadoop but > containing severe changes the community, by way of the ASF umbrella we all > work under, had nothing to do with designing or developing. And some of > these are being open sourced as a Hadoop. There is no Linus here. Which of > these is _the_ Hadoop? As a would-be contributor, which should I select? >
