>> Yup. And that's a second major problem -- dependency leakage. >> > Is this not a common issue with maven artifacts? How do people > generally deal with it? >
Yes, this is a common issue. In Oozie we've dealt with this by excluding from Hadoop dependencies all sort of things that may come in the different versions we use. And things get worse because of incorrect dependency classification (ie junit defined as required for execution - they should be marked as scope=test), unnecessary dependencies (ie log4j bringing in JMS and Mail - they should be marked optional=true). And finally, and I thing this is the worst offender, because Hadoop does not have a client artifact, thus bringing to the client all sort of JARs that are not needed when using the client APIs. And to complicate things more, with 0.23 we are introducing several artifacts (hadoop-common, hadoop-hdfs, hadoop-mapreduce-client-*) that must be included by clients. These are different from the old hadoop-core. Thus forcing downstream projects to use (in the case of Maven projects) profiles to include one or other. A solution would be that in 0.23+ we have an umbrella hadoop-core artifact that groups all the hadoop artifacts needed by client excluding the not needed ones. If you thing this is a good idea we should move this discussion to the Hadoop alias. Thanks. Alejandro
