Hi, I've packaged hadoop for Debian. (It's still in the upload queue for verification[1]). One common annoyance when packaging java applications for a Free Software distribution is the necessity to repackage the upstream tarball. The repackaging is necessary, because Debian may only distribute binary files build from source that's also available from Debian. So we build the jar/war files ourselfes to make sure there's nothing we don't have the sources for. It would take one (annoying and time consuming) step less for packagers, if java upstream projects would release an additional tarball without any binary files or third party code. I'm asking you first, because many other projects (like zookeeper) took or take hadoop as an example for their build infrastructure. For your orientation, these are the patterns that I used to filter the hadoop tarball: (Usable with tar --exclude)
"*.jar", "uming.*", "prototype.js", "config.sub", "config.guess", "ltmain.sh", "Makefile.in", "configure", "aclocal.m4", "config.h.in", "install-sh", "autom4te.cache", "depcomp", "missing", "pipes/compile", "src/contrib/eclipse-plugin/resources/*.jpg", "src/contrib/eclipse-plugin/resources/*.png", "src/contrib/eclipse-plugin/resources/*.gif", "hadoop-0.20.1/src/core/org/apache/hadoop/record/compiler/generated/*.java", "hadoop-0.20.1/src/docs/cn/build", "hadoop-0.20.1/c++", "hadoop-0.20.1/contrib", "hadoop-0.20.1/lib/native", "hadoop-0.20.1/librecordio", "hadoop-0.20.1/src/contrib/thriftfs/gen-*", "hadoop-0.20.1/docs", There were different reasons why stuff needed to be filtered: - unclear license (uming.*) - unclear origin (images in the eclipse plugin) - precompiled documentation / code / hadoop binaries - pregenerated C/C++ automake files - third party libraries (prototype.js, lib/*.jar) If you'd be willing to release an additional tarball like this, I'd provide the necessary patch to your build.xml [1] http://ftp-master.debian.org/new.html Best regards, Thomas Koch, http://www.koch.ro