Nice reality check and thanks for the how it was addressed elsewhere Steve.
As you say, it sounds like a large undertaking but it would be a sweet service for the downstreamers. St.Ack On Thu, Jun 9, 2011 at 4:42 AM, Steve Loughran <[email protected]> wrote: > On 06/08/2011 06:41 PM, Suresh Srinivas wrote: >> >> I do not see any issue with the change that Todd has made. We have done >> similar changes in HDFS-1586 in the past. >> >> Making APIs public comes with a cost. That is what we are avoiding with >> LimitedPrivate. The intention was to include the following projects that >> are >> closely tied to Hadoop as projects eligible for LimitedPrivate. >> {"HBase", "HDFS", "Hive", "MapReduce", "Pig"}. This list could grow in the >> future. > > I'm going to talk about my experience on the Ant team. > > One of the lessons of that project is that in the open source world, you > can't predict how your code gets used, or control it. If someone wants to > take your app and use it as a library -they can. If someone wants to do > something completely unexpected with that library -they can. And this is a > good thing, because your code gets used. Yes, you get new bugreps, but every > person using your code is someone not using somebody elses code. You win. > > The other lesson from that is the following: in open source, there is no > such thing as private code. > > * If you mark something as package scoped, they just inject their classes > into your package (and who hasn't done that with their Hadoop extensions?). > * If you mark something as protected, they subclass and open up its privacy. > * If you mark something as private, they edit your source and create a new > JAR with the relaxed permission > > for any of these actions, you end up fielding the bugreps, as the stack > trace points to you. And it increases maintenance costs for everyone. > > > Alternatively they cut and paste your code into their codebase, possibly > -but not always- retaining the apache credits. > > That > * complicates copyright and lawsuits: > http://www.theserverside.com/news/thread.tss?thread_id=29958 > > * increases maintenance costs for everyone, especially if there are > security issues with the original code. > >> When such projects break because of API change, we can co-ordinate as >> community and fix the issues. This is not true for some application that >> we >> do not know of breaks! > > The way Ant handled this with Gump, the nightly clean build of all the OSS > Java projects built with Ant > http://vmgump.apache.org/gump/public/ > > For all the projects, they thought they were getting a free CI build run, > but what it really was was a regression test of Ant and every single OSS > project. If a change in Ant broke anyone's build: we noticed. If a change in > Log4J broke a build, someone noticed. It became a rapid-response regression > test for the entire OSS suite. > > Sadly, it doesn't work so well. I'd blame Maven, but the move to ivy > dependencies doesn't help either, it complicates classpaths no end. > > Even so, the idea is great: build and test your downstream applications, and > the things you depend on, so you find problems within 24 hours of the change > being committed -regardless of which project committed the change. > > The way to do it now would be with Jenkins, not just building and testing > Hadooop-{core, hdfs, mapreduce}, but > -building and publishing every upstream dependency. > -test against the trunk versions build locally. > -build and test against the ivy-versioned artifacts that are controlled by > the version.properties > > Together this flags up when something works against the old artifacts, but > doesn't work against the trunk versions: that's their regressions, caught > early. > > Downstream > -build and test the OSS projects that work with Hadoop. > That's the apache ones: HBase, Mahout, Pig, Hive, Hama etc, and the other > ones, such as Cascading. > > That can be offered as a service to these projects "we will build and test > your code against our trunk", a service designed to benefit everyone. They > find their bugs, we find regressions. > > This is a pretty complex project, especially when you think about the > challenge of testing your RPM generation code will install the RPMs (I bring > up clean CentOS VMs for such a purpose), but without it you don't get > everything working together, which is the state things appear to be in > today. > > Ignoring the RPM install & test problems, if people are interested in > working on this, we should be able to do a lot of it on Jenkins. Who is > willing to get involved? > > -Steve >
