Re: debian package of hadoop

Steve Loughran Mon, 04 Jan 2010 04:38:35 -0800

Jordà Polo wrote:

On Wed, Dec 30, 2009 at 07:53:43PM +0100, Thomas Koch wrote:

today I tried to run the cloudera debian dist on a 4 machine cluster. I still
have some itches, see my list below. Some of them may require a fix in the
packaging.
Therefor I thought that it may be time to start an official debian package of
hadoop with a public GIT repository so that everybody can participate.
Would cloudera support this? I'd package hadoop 0.20 and apply all the
cloudera patches (managed with topgit[1]).
At this point I'd like to have your opinion whether it would be wise to have
versioned binary packages like hadoop-18, hadoop-20 or just plain hadoop for
the Debian package?


Hi Thomas,

I have been thinking about an official Hadoop Debian package for a while
too.

If you want "official" as in can say "Apache Hadoop" on it, then it willneed to be managed and released as an apache project. That meanssomewhere in ASF SVN. If you want to cut your own, please give it adifferent name to avoid problems later.

The main issue that prevents the inclusion of the current Cloudera
package into Debian is that it depends on Sun's Java. I think it would
be interesting, at least for an official Debian package, to depend on
OpenJDK in order to make it possible to distribute it in "main" instead
of "contrib".

+1 to more on packaging; I'd go so far as push for a separate"deployment" subproject which would be downstream of everything,including HBase and other layers.

I view .deb and .RPM releases as stuff you would push out to clusters,maybe with custom config files for everything else. Having the abilityto create your own packages on demand would appear to be something thatpeople need (disclaimer, I do create my own RPMs)

I would go for the package to not bother mentioning which Java itdepends on, as that lets you run on any Java version, jrockit included.Or drive the .deb creation process such that you can decide at releasetime what the options are for any specific target cluster.

Also, note that in order to fit into Debian's package autobuilding
system, some scripts will probably require some tweaking. For instance,
by default Hadoop downloads dependencies at build time using ivy, but
Debian packages should use already existing packages. Incidentally,
Hadoop depends on some libraries that aren't available in Debian yet,
such as xmlenc, so there is even more work to do.

Well, we'll just have to ignore the debian autobuilding process then,won't we?

There are some hooks in Ivy and Ant to give local machine artifactspriority over other stuff, but it's not ideal. Let's just say there aredifferences in opinion between some of the linux packaging people andothers as to what is the correct way to manage dependencies. I'm in the"everything is specified under SCM" camp; others are in the "buildagainst what you find" world.


I cut my rpms by

* pushing the .rpm template through a <copy> with property expansion;this creates an RPM containing all the version markers set up right anddriven my build's properties files.* not declaring dependencies on anything, java or any other JAR likelog4J. This ensures my code runs with the JARs I told it to, notanything else. Also it gives me the option to sign all the JARs, whichthe normal Linux packaging doesn't like.* releasing the tar of everything needed to sign the JARs and create theRPMs as a redistributable. This gives anyone else the option to createtheir own RPMs too. You don' t need to move the entire build/releaseprocess to source RPMs or .debs for this, any more than the Ant or log4Jpackages get built/released this way.* <scp> the packages to a VMWare or Virtualbox image of each supportedplatform, ssh in and exec the rpm uninstall/install commands, then walkthe scripts through their lifecycle. You were planning on testing theupgrade process weren't you ?


-steve


(Anyway, I'm interested in the package, so let me know if you need some
help and want to set up a group on alioth or something.)


A lot of the fun here is not going to be setting up the package files (

Re: debian package of hadoop

Reply via email to