Jordà Polo wrote:
On Wed, Dec 30, 2009 at 07:53:43PM +0100, Thomas Koch wrote:
today I tried to run the cloudera debian dist on a 4 machine cluster. I still
have some itches, see my list below. Some of them may require a fix in the
packaging.
Therefor I thought that it may be time to start an official debian package of
hadoop with a public GIT repository so that everybody can participate.
Would cloudera support this? I'd package hadoop 0.20 and apply all the
cloudera patches (managed with topgit[1]).
At this point I'd like to have your opinion whether it would be wise to have
versioned binary packages like hadoop-18, hadoop-20 or just plain hadoop for
the Debian package?

Hi Thomas,

I have been thinking about an official Hadoop Debian package for a while
too.

If you want "official" as in can say "Apache Hadoop" on it, then it will need to be managed and released as an apache project. That means somewhere in ASF SVN. If you want to cut your own, please give it a different name to avoid problems later.

The main issue that prevents the inclusion of the current Cloudera
package into Debian is that it depends on Sun's Java. I think it would
be interesting, at least for an official Debian package, to depend on
OpenJDK in order to make it possible to distribute it in "main" instead
of "contrib".

+1 to more on packaging; I'd go so far as push for a separate "deployment" subproject which would be downstream of everything, including HBase and other layers.

I view .deb and .RPM releases as stuff you would push out to clusters, maybe with custom config files for everything else. Having the ability to create your own packages on demand would appear to be something that people need (disclaimer, I do create my own RPMs)

I would go for the package to not bother mentioning which Java it depends on, as that lets you run on any Java version, jrockit included. Or drive the .deb creation process such that you can decide at release time what the options are for any specific target cluster.


Also, note that in order to fit into Debian's package autobuilding
system, some scripts will probably require some tweaking. For instance,
by default Hadoop downloads dependencies at build time using ivy, but
Debian packages should use already existing packages. Incidentally,
Hadoop depends on some libraries that aren't available in Debian yet,
such as xmlenc, so there is even more work to do.

Well, we'll just have to ignore the debian autobuilding process then, won't we?

There are some hooks in Ivy and Ant to give local machine artifacts priority over other stuff, but it's not ideal. Let's just say there are differences in opinion between some of the linux packaging people and others as to what is the correct way to manage dependencies. I'm in the "everything is specified under SCM" camp; others are in the "build against what you find" world.

I cut my rpms by
* pushing the .rpm template through a <copy> with property expansion; this creates an RPM containing all the version markers set up right and driven my build's properties files. * not declaring dependencies on anything, java or any other JAR like log4J. This ensures my code runs with the JARs I told it to, not anything else. Also it gives me the option to sign all the JARs, which the normal Linux packaging doesn't like. * releasing the tar of everything needed to sign the JARs and create the RPMs as a redistributable. This gives anyone else the option to create their own RPMs too. You don' t need to move the entire build/release process to source RPMs or .debs for this, any more than the Ant or log4J packages get built/released this way. * <scp> the packages to a VMWare or Virtualbox image of each supported platform, ssh in and exec the rpm uninstall/install commands, then walk the scripts through their lifecycle. You were planning on testing the upgrade process weren't you ?

-steve


(Anyway, I'm interested in the package, so let me know if you need some
help and want to set up a group on alioth or something.)

A lot of the fun here is not going to be setting up the package files (

Reply via email to