I don't use the nutch web application, but....  You don't have to
start nutch in the searcher directory.  You can set the location of
the searcher dir within the nutch-site.xml config file.

Add this node and set the location of your index:

<property>
 <name>searcher.dir</name>
 <value>/your/path/to/your/index</value>
 <description>
 Path to root of crawl.  This directory is searched (in
 order) for either the file search-servers.txt, containing a list of
 distributed search servers, or the directory "index" containing
 merged indexes, or the directory "segments" containing segment
 indexes.
 </description>
</property>







On 7/19/07, Robert Young <[EMAIL PROTECTED]> wrote:
Tomcat only comes into it because we have to start Tomcat in the
searcher directory, I'm guessing it's the same however you choose to
use Nutch. It would still have to do a rename across physical volumes
if searcher.dir is set to something different would it not?

How does this sound as a sollution? Allow the user to set a
configuration option setting the linkdb working dir, or allow the user
to set a configuration flag to use another particular configuration
option to set the base dir. Otherwise fall back to the default which
is the current working directory.

Cheers
Rob

On 7/19/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
> Robert Young wrote:
> > In org.apache.nutch.crawl.LinkDb on line 261 it creates a working
> > directory (newLinkDb) based on the current working directory. This
> > should be configurable rather than being based on where Tomcat was
> > started. I am planning on writing a patch to pull the hadoop.tmp.dir
> > setting if it is available, falling back to the current directory.
> >
> > Can anyone see any obvious problems with doing this?
>
> I'm not sure what Tomcat has to do with this. LinkDb does it this way in
> order to avoid rename() operation across physical volumes - if you
> invoke rename() on a local FS it may trigger a costly copy operation.
>
>
> --
> Best regards,
> Andrzej Bialecki     <><
>   ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>



--
"Conscious decisions by conscious minds are what make reality real"

Reply via email to