Sami Siren wrote:
Andrzej Bialecki wrote:
Sami Siren wrote:

Jukka Zitting was suggesting we should rethink the Nutch release packaging because of it's size. I don't see this as a blocker for 1.0 but we could perhaps start the discussion about this anyway so throw in your opinions...

I agree with you and Jukka that we should provide separate tarballs of source and binaries. This likely won't result in significant size reductions (anyway, what's a measly 90MB nowadays .. ;) but it would help other parties to deploy clean binaries and/or track the officially released sources.

The source package is straight forward one. Size of source package would be about 30GB. but the binary package will still remain quite big if we
           ^^^^

Now, this is big, indeed ;)

need to allow it to run on local and distributed mode (plugins as exploded format and also the .job + .war), size of such binary package would still be nearly 80G.

We could split the binary to yet smaller pieces: one for local mode, one for distributed mode, and the .war separately but I am not sure if that's worth the effort.

I don't think so either. Please remember also that each binary sub-package may create its own range of support issues ...

How about the following: we build just 2 packages:

* binary: this includes only base hadoop libs in lib/ (enough to start a local job, no optional filesystems etc), the *.job and *.war files and scripts. Scripts would check for the presence of plugins/ dir, and offer an option to create it from *.job. Assumption here is that this shouldbe enough to run full cycle in local mode, and that people who want to run a distributed cluster will first install a plain Hadoop release, and then just put the *.job and bin/nutch on the master.

* source: no build artifacts, no .svn (equivalent to svn export), simple tgz.


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to