Ahh, now I see what you are referring to. Thanks for the question. Now I know why I was getting garbage in my directory a while back. So, I guess you may need to edit that class. Are you using hadoop in local mode?
On 7/19/07, Robert Young <[EMAIL PROTECTED]> wrote: > Yes, I do this for the searcher directory but in the LinkDb class it > makes a reference to a Path which is relative (just for a temporary > working directory). This is the problem, because if I start tomcat in > a path where the java user does not have permissions to create a > directory then LinkDb fails. > > On 7/19/07, Briggs <[EMAIL PROTECTED]> wrote: > > I don't use the nutch web application, but.... You don't have to > > start nutch in the searcher directory. You can set the location of > > the searcher dir within the nutch-site.xml config file. > > > > Add this node and set the location of your index: > > > > <property> > > <name>searcher.dir</name> > > <value>/your/path/to/your/index</value> > > <description> > > Path to root of crawl. This directory is searched (in > > order) for either the file search-servers.txt, containing a list of > > distributed search servers, or the directory "index" containing > > merged indexes, or the directory "segments" containing segment > > indexes. > > </description> > > </property> > > > > > > > > > > > > > > > > On 7/19/07, Robert Young <[EMAIL PROTECTED]> wrote: > > > Tomcat only comes into it because we have to start Tomcat in the > > > searcher directory, I'm guessing it's the same however you choose to > > > use Nutch. It would still have to do a rename across physical volumes > > > if searcher.dir is set to something different would it not? > > > > > > How does this sound as a sollution? Allow the user to set a > > > configuration option setting the linkdb working dir, or allow the user > > > to set a configuration flag to use another particular configuration > > > option to set the base dir. Otherwise fall back to the default which > > > is the current working directory. > > > > > > Cheers > > > Rob > > > > > > On 7/19/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote: > > > > Robert Young wrote: > > > > > In org.apache.nutch.crawl.LinkDb on line 261 it creates a working > > > > > directory (newLinkDb) based on the current working directory. This > > > > > should be configurable rather than being based on where Tomcat was > > > > > started. I am planning on writing a patch to pull the hadoop.tmp.dir > > > > > setting if it is available, falling back to the current directory. > > > > > > > > > > Can anyone see any obvious problems with doing this? > > > > > > > > I'm not sure what Tomcat has to do with this. LinkDb does it this way in > > > > order to avoid rename() operation across physical volumes - if you > > > > invoke rename() on a local FS it may trigger a costly copy operation. > > > > > > > > > > > > -- > > > > Best regards, > > > > Andrzej Bialecki <>< > > > > ___. ___ ___ ___ _ _ __________________________________ > > > > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > > > > ___|||__|| \| || | Embedded Unix, System Integration > > > > http://www.sigram.com Contact: info at sigram dot com > > > > > > > > > > > > > > > > > -- > > "Conscious decisions by conscious minds are what make reality real" > > > -- "Conscious decisions by conscious minds are what make reality real" ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers