On 06/08/2011 06:41 PM, Suresh Srinivas wrote:
I do not see any issue with the change that Todd has made. We have done
similar changes in HDFS-1586 in the past.
Making APIs public comes with a cost. That is what we are avoiding with
LimitedPrivate. The intention was to include the following projects that are
closely tied to Hadoop as projects eligible for LimitedPrivate.
{"HBase", "HDFS", "Hive", "MapReduce", "Pig"}. This list could grow in the
future.
I'm going to talk about my experience on the Ant team.
One of the lessons of that project is that in the open source world, you
can't predict how your code gets used, or control it. If someone wants
to take your app and use it as a library -they can. If someone wants to
do something completely unexpected with that library -they can. And this
is a good thing, because your code gets used. Yes, you get new bugreps,
but every person using your code is someone not using somebody elses
code. You win.
The other lesson from that is the following: in open source, there is no
such thing as private code.
* If you mark something as package scoped, they just inject their
classes into your package (and who hasn't done that with their Hadoop
extensions?).
* If you mark something as protected, they subclass and open up its
privacy.
* If you mark something as private, they edit your source and create a
new JAR with the relaxed permission
for any of these actions, you end up fielding the bugreps, as the stack
trace points to you. And it increases maintenance costs for everyone.
Alternatively they cut and paste your code into their codebase, possibly
-but not always- retaining the apache credits.
That
* complicates copyright and lawsuits:
http://www.theserverside.com/news/thread.tss?thread_id=29958
* increases maintenance costs for everyone, especially if there are
security issues with the original code.
When such projects break because of API change, we can co-ordinate as
community and fix the issues. This is not true for some application that we
do not know of breaks!
The way Ant handled this with Gump, the nightly clean build of all the
OSS Java projects built with Ant
http://vmgump.apache.org/gump/public/
For all the projects, they thought they were getting a free CI build
run, but what it really was was a regression test of Ant and every
single OSS project. If a change in Ant broke anyone's build: we noticed.
If a change in Log4J broke a build, someone noticed. It became a
rapid-response regression test for the entire OSS suite.
Sadly, it doesn't work so well. I'd blame Maven, but the move to ivy
dependencies doesn't help either, it complicates classpaths no end.
Even so, the idea is great: build and test your downstream applications,
and the things you depend on, so you find problems within 24 hours of
the change being committed -regardless of which project committed the
change.
The way to do it now would be with Jenkins, not just building and
testing Hadooop-{core, hdfs, mapreduce}, but
-building and publishing every upstream dependency.
-test against the trunk versions build locally.
-build and test against the ivy-versioned artifacts that are
controlled by the version.properties
Together this flags up when something works against the old artifacts,
but doesn't work against the trunk versions: that's their regressions,
caught early.
Downstream
-build and test the OSS projects that work with Hadoop.
That's the apache ones: HBase, Mahout, Pig, Hive, Hama etc, and the
other ones, such as Cascading.
That can be offered as a service to these projects "we will build and
test your code against our trunk", a service designed to benefit
everyone. They find their bugs, we find regressions.
This is a pretty complex project, especially when you think about the
challenge of testing your RPM generation code will install the RPMs (I
bring up clean CentOS VMs for such a purpose), but without it you don't
get everything working together, which is the state things appear to be
in today.
Ignoring the RPM install & test problems, if people are interested in
working on this, we should be able to do a lot of it on Jenkins. Who is
willing to get involved?
-Steve