Thank you Dave, very helfull. -Ledio
-----Original Message----- From: Goldschmidt, Dave [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 20, 2005 7:24 AM To: [email protected] Subject: RE: [Nutch-dev] distributed search Hi Rafi, Not sure if anyone answered this, but I think you're just after the segslice command: $ nutch segslice SegmentSlicer (-local | -ndfs <namenode:port>) -o outputDir [-max count] [-fix] [-nocontent] [-noparsedata] [-noparsetext] [-filterUrlBy (+|-)perl5pattern] [-logLevel logLevel] (-dir segments | seg1 seg2 ...) NOTE: at least one segment dir name is required, or '-dir' option. outputDir is always required. -o outputDir output directory for segments -max count (optional) output multiple segments, each with maximum 'count' entries -fix (optional) automatically fix corrupted segments -nocontent (optional) ignore content data -noparsedata (optional) ignore parse_data data -nocontent (optional) ignore parse_text data -filterUrlBy (optional) Filter entry by matching its url with a perl5 pattern. Prefix '+' means: default to skip, match to save. Prefix '-' means: default to save, match to skip. If no pattern given, no filtering (all are saved). -logLevel (optional) logging level -dir segments directory containing multiple segments seg1 seg2 ... segment directories HTH, DaveG -----Original Message----- From: Ledio Ago [mailto:[EMAIL PROTECTED] Sent: Monday, December 19, 2005 8:25 PM To: [email protected] Subject: RE: [Nutch-dev] distributed search Rafi, Based on what you're saying, this tool splits a fetchlist into several fetchlists so that we can crawl/fetch the URLs from different fetchers, right?? If so, that's is not what I'm after. I'm trying to split an existing index into smaller partitions, so that I can make those partinions searchable from multiple nutch serchers, distributed search. Thanks, Ledio -----Original Message----- From: Rafi Iz [mailto:[EMAIL PROTECTED] Sent: Monday, December 19, 2005 4:49 PM To: [email protected] Subject: Re: [Nutch-dev] distributed search check the next command FetchListTool (-local | -ndfs <namenode:port>) <db> <segment_dir> [-refetchonly] [-topN N] [-cutoff cutoffscore] [-numFetchers numFetchers] [-adddays numDays] This command call to a function called emitMultipleLists which spit out several fetchlists, so that you can fetch across several machines. e.g. bin/nutch org.apache.nutch.tools.FetchListTool ...... Rafi >From: Stefan Groschupf <[EMAIL PROTECTED]> >Reply-To: [email protected] >To: [email protected] >Subject: Re: [Nutch-dev] distributed search >Date: Tue, 20 Dec 2005 00:38:22 +0100 > >>By the way, is there an easy way to split the index I have already have. >>I would hate to recrawl all of the 1.9MM URLs again and waste bandwidth. > >Well I do not know any tool that comes with nutch or a other tool that >does it, may there is one. >But to write a java class that creates two smaller indexes from one large >is very easy, a hour work maximum. >Just check any of the existing lucene tutorial, lucene java doc or the >lucene book. >BTW, Erik Hatcher's book "Lucene in action" is a MUST for all nutch users. >:-) > >Stefan > _________________________________________________________________ Express yourself instantly with MSN Messenger! Download today - it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
