Re: Nutch-87 Setup

2005-09-15 Thread Matt Kangas
Michael, this looks like an error in your Nutch configuration, or possibly your CLASSPATH. I'd guess it's the former. Take a look at the following nutch-site.xml (or nutch-default) properties, and make sure they reference (a) the right place on disk, (b) plugins that actually exist: -

Re: (NUTCH-88) Enhance ParserFactory plugin selection policy

2005-09-15 Thread Jon Shoberg
Jérôme Charron wrote: Hi, Chris, Sébastien and me have worked on a proposal for solving the NUTCH-88 issue. This proposal is available on the Nutch Wiki at http://wiki.apache.org/nutch/ParserFactoryImprovementProposal. Thanks for reading it, commenting it, and voting for it (+ or -). Best

Re: [Nutch-cvs] [Nutch Wiki] Update of ParserFactoryImprovementProposal by ChrisMattmann

2005-09-15 Thread ogjunk-nutch
Quick comment about order=N and the paragraph that describes how to deal with cases where people mess things up and enter multiple plugins for the same content type and the same order: - Why is the order attribute even needed? It looks like a redundant piece of information - why not derive order

[jira] Commented: (NUTCH-92) DistributedSearch incorrectly scores results

2005-09-15 Thread Otis Gospodnetic (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-92?page=comments#action_12329473 ] Otis Gospodnetic commented on NUTCH-92: --- I recall a discussion on lucene-dev list several (6+?) months back about this or very similar issue. Lucene's MultiSearcher has

[jira] Commented: (NUTCH-92) DistributedSearch incorrectly scores results

2005-09-15 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-92?page=comments#action_12329474 ] Doug Cutting commented on NUTCH-92: --- A minor detail: In Searcher, instead of int[] getDocFreqs(Term[]); The new method will probably have to be something like public

[jira] Commented: (NUTCH-92) DistributedSearch incorrectly scores results

2005-09-15 Thread Otis Gospodnetic (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-92?page=comments#action_12329476 ] Otis Gospodnetic commented on NUTCH-92: --- Ah, you are right, I remember this getting in the core. As a matter of fact, it might have been me who committed it in the end.

Re: merge mapred to trunk

2005-09-15 Thread Doug Cutting
I will postpone the merge of the mapred branch into trunk until I have a chance to (a) add some MapReduce documentation; and (b) implement MapReduce-based dedup. Doug Doug Cutting wrote: Currently we have three versions of nutch: trunk, 0.7 and mapred. This increases the chances for

Re: Whole-web crawling with the mapreduce branch

2005-09-15 Thread Doug Cutting
For now, look at the source for crawl/Crawl.java. I'll try to add some documentation ASAP. Doug Steffen Viken Valvåg wrote: Hi, I'm playing around with the mapreduce branch, and got it working for a simple intranet crawl by following the nutch tutorial on

Re: [Nutch-cvs] [Nutch Wiki] Update of ParserFactoryImprovementProposal by ChrisMattmann

2005-09-15 Thread Chris Mattmann
Hi Otis, Point taken. In actuality since both convey the same information I think that it's okay to support both, but by default say we could code the initial plugins specified in parse-plugins.xml without the order= attribute. Fair enough? Cheers, Chris On 9/15/05 3:23 PM, [EMAIL

Re: [Nutch-cvs] [Nutch Wiki] Update of ParserFactoryImprovementProposal by ChrisMattmann

2005-09-15 Thread ogjunk-nutch
Sounds good to me. Otis --- Chris Mattmann [EMAIL PROTECTED] wrote: Hi Otis, Point taken. In actuality since both convey the same information I think that it's okay to support both, but by default say we could code the initial plugins specified in parse-plugins.xml without the order=