Re: [DISCUSS] contents of nutch release artifact

Andrzej Bialecki Thu, 19 Mar 2009 14:12:37 -0700

Eric J. Christeson wrote:

On Mar 19, 2009, at 12:03 PM, Sami Siren wrote:
Andrzej Bialecki wrote:
How about the following: we build just 2 packages:
* binary: this includes only base hadoop libs in lib/ (enough tostart a local job, no optional filesystems etc), the *.job and *.warfiles and scripts. Scripts would check for the presence of plugins/dir, and offer an option to create it from *.job. Assumption here isthat this shouldbe enough to run full cycle in local mode, and thatpeople who want to run a distributed cluster will first install aplain Hadoop release, and then just put the *.job and bin/nutch onthe master.* source: no build artifacts, no .svn (equivalent to svn export),simple tgz.
this sounds good to me. additionally some new documentation needs tobe written too.
Distributed is a little more complicated than just dropping *.job andbin/nutch on a hadoop install. Will this even work unless one editsconfig/<stuff> and builds a new .job? Anyone using distributed nutchprobably wouldn't be interested in something trivial so a step-by-stepconfig how-to would probably be a good idea.

Actually, this works very well and it _is_ just a matter of dropping the*.job file and a (slightly) modified bin/nutch.

Some time ago I committed a fix that removed Hadoop artifacts from nutch*.job file. This was exactly to avoid confusion that multiplehadoop-site.xml and hadoop*.jar caused (one in your Hadoop install andthe other in your Nutch job jar). So now the only place where you shouldedit Hadoop-related stuff is in your Hadoop conf/ dir, and the onlyplace where you should edit Nutch-related stuff is in your Nutch conf/dir (and after that indeed you need to rebuild the *.job jar and dropthe new version to your Hadoop master).


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: [DISCUSS] contents of nutch release artifact

Reply via email to