Re: [Nutch-dev] Looking to fix relative path issue in linkdb

Briggs Thu, 19 Jul 2007 10:32:31 -0700

Ahh, now I see what you are referring to.  Thanks for the question.
Now I know why I was getting garbage in my directory a while back.
So, I guess you may need to edit that class.  Are you using hadoop in
local mode?



On 7/19/07, Robert Young <[EMAIL PROTECTED]> wrote:
> Yes, I do this for the searcher directory but in the LinkDb class it
> makes a reference to a Path which is relative (just for a temporary
> working directory). This is the problem, because if I start tomcat in
> a path where the java user does not have permissions to create a
> directory then LinkDb fails.
>
> On 7/19/07, Briggs <[EMAIL PROTECTED]> wrote:
> > I don't use the nutch web application, but....  You don't have to
> > start nutch in the searcher directory.  You can set the location of
> > the searcher dir within the nutch-site.xml config file.
> >
> > Add this node and set the location of your index:
> >
> > <property>
> >   <name>searcher.dir</name>
> >   <value>/your/path/to/your/index</value>
> >   <description>
> >   Path to root of crawl.  This directory is searched (in
> >   order) for either the file search-servers.txt, containing a list of
> >   distributed search servers, or the directory "index" containing
> >   merged indexes, or the directory "segments" containing segment
> >   indexes.
> >   </description>
> > </property>
> >
> >
> >
> >
> >
> >
> >
> > On 7/19/07, Robert Young <[EMAIL PROTECTED]> wrote:
> > > Tomcat only comes into it because we have to start Tomcat in the
> > > searcher directory, I'm guessing it's the same however you choose to
> > > use Nutch. It would still have to do a rename across physical volumes
> > > if searcher.dir is set to something different would it not?
> > >
> > > How does this sound as a sollution? Allow the user to set a
> > > configuration option setting the linkdb working dir, or allow the user
> > > to set a configuration flag to use another particular configuration
> > > option to set the base dir. Otherwise fall back to the default which
> > > is the current working directory.
> > >
> > > Cheers
> > > Rob
> > >
> > > On 7/19/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
> > > > Robert Young wrote:
> > > > > In org.apache.nutch.crawl.LinkDb on line 261 it creates a working
> > > > > directory (newLinkDb) based on the current working directory. This
> > > > > should be configurable rather than being based on where Tomcat was
> > > > > started. I am planning on writing a patch to pull the hadoop.tmp.dir
> > > > > setting if it is available, falling back to the current directory.
> > > > >
> > > > > Can anyone see any obvious problems with doing this?
> > > >
> > > > I'm not sure what Tomcat has to do with this. LinkDb does it this way in
> > > > order to avoid rename() operation across physical volumes - if you
> > > > invoke rename() on a local FS it may trigger a costly copy operation.
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Andrzej Bialecki     <><
> > > >   ___. ___ ___ ___ _ _   __________________________________
> > > > [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> > > > ___|||__||  \|  ||  |  Embedded Unix, System Integration
> > > > http://www.sigram.com  Contact: info at sigram dot com
> > > >
> > > >
> > >
> >
> >
> > --
> > "Conscious decisions by conscious minds are what make reality real"
> >
>


-- 
"Conscious decisions by conscious minds are what make reality real"

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Re: [Nutch-dev] Looking to fix relative path issue in linkdb

Reply via email to