Michael, this looks like an error in your Nutch configuration, or
possibly your CLASSPATH. I'd guess it's the former. Take a look at
the following nutch-site.xml (or nutch-default) properties, and make
sure they reference (a) the right place on disk, (b) plugins that
actually exist:
-
Jérôme Charron wrote:
Hi,
Chris, Sébastien and me have worked on a proposal for solving the NUTCH-88
issue.
This proposal is available on the Nutch Wiki at
http://wiki.apache.org/nutch/ParserFactoryImprovementProposal.
Thanks for reading it, commenting it, and voting for it (+ or -).
Best
Quick comment about order=N and the paragraph that describes how to
deal with cases where people mess things up and enter multiple plugins
for the same content type and the same order:
- Why is the order attribute even needed? It looks like a redundant
piece of information - why not derive order
[
http://issues.apache.org/jira/browse/NUTCH-92?page=comments#action_12329473 ]
Otis Gospodnetic commented on NUTCH-92:
---
I recall a discussion on lucene-dev list several (6+?) months back about this
or very similar issue. Lucene's MultiSearcher has
[
http://issues.apache.org/jira/browse/NUTCH-92?page=comments#action_12329474 ]
Doug Cutting commented on NUTCH-92:
---
A minor detail:
In Searcher, instead of
int[] getDocFreqs(Term[]);
The new method will probably have to be something like
public
[
http://issues.apache.org/jira/browse/NUTCH-92?page=comments#action_12329476 ]
Otis Gospodnetic commented on NUTCH-92:
---
Ah, you are right, I remember this getting in the core. As a matter of fact,
it might have been me who committed it in the end.
I will postpone the merge of the mapred branch into trunk until I have a
chance to (a) add some MapReduce documentation; and (b) implement
MapReduce-based dedup.
Doug
Doug Cutting wrote:
Currently we have three versions of nutch: trunk, 0.7 and mapred. This
increases the chances for
For now, look at the source for crawl/Crawl.java.
I'll try to add some documentation ASAP.
Doug
Steffen Viken Valvåg wrote:
Hi,
I'm playing around with the mapreduce branch, and got it working for a
simple intranet crawl by following the nutch tutorial on
Hi Otis,
Point taken. In actuality since both convey the same information I think
that it's okay to support both, but by default say we could code the initial
plugins specified in parse-plugins.xml without the order= attribute. Fair
enough?
Cheers,
Chris
On 9/15/05 3:23 PM, [EMAIL
Sounds good to me.
Otis
--- Chris Mattmann [EMAIL PROTECTED] wrote:
Hi Otis,
Point taken. In actuality since both convey the same information I
think
that it's okay to support both, but by default say we could code the
initial
plugins specified in parse-plugins.xml without the order=
10 matches
Mail list logo