Sami Siren wrote:
Andrzej Bialecki wrote:
Sami Siren wrote:
Jukka Zitting was suggesting we should rethink the Nutch release
packaging because of it's size. I don't see this as a blocker for 1.0
but we could perhaps start the discussion about this anyway so throw
in your opinions...
I agree with you and Jukka that we should provide separate tarballs of
source and binaries. This likely won't result in significant size
reductions (anyway, what's a measly 90MB nowadays .. ;) but it would
help other parties to deploy clean binaries and/or track the
officially released sources.
The source package is straight forward one. Size of source package would
be about 30GB. but the binary package will still remain quite big if we
^^^^
Now, this is big, indeed ;)
need to allow it to run on local and distributed mode (plugins as
exploded format and also the .job + .war), size of such binary package
would still be nearly 80G.
We could split the binary to yet smaller pieces: one for local mode, one
for distributed mode, and the .war separately but I am not sure if
that's worth the effort.
I don't think so either. Please remember also that each binary
sub-package may create its own range of support issues ...
How about the following: we build just 2 packages:
* binary: this includes only base hadoop libs in lib/ (enough to start a
local job, no optional filesystems etc), the *.job and *.war files and
scripts. Scripts would check for the presence of plugins/ dir, and offer
an option to create it from *.job. Assumption here is that this shouldbe
enough to run full cycle in local mode, and that people who want to run
a distributed cluster will first install a plain Hadoop release, and
then just put the *.job and bin/nutch on the master.
* source: no build artifacts, no .svn (equivalent to svn export), simple
tgz.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com