Hi

I am having a problem with the nutch-0.9 fetcher. During a fetch the fetch process I get the following message in my hadoop.log:

2007-06-12 12:23:25,892 INFO plugin.PluginRepository - Nutch URL Filter (org.apache.nutch.net.URLFilter)2007-06-12 12:23:25,892 INFO plugin.PluginRepository - Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter)2007-06-12 12:23:25,892 INFO plugin.PluginRepository - Nutch Online Search Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer) 2007-06-12 12:23:25,892 INFO plugin.PluginRepository - HTML Parse Filter (org.apache.nutch.parse.HtmlParseFilter)2007-06-12 12:23:25,905 INFO plugin.PluginRepository - Nutch Content Parser (org.apache.nutch.parse.Parser)2007-06-12 12:23:25,905 INFO plugin.PluginRepository - Nutch Scoring (org.apache.nutch.scoring.ScoringFilter)2007-06-12 12:23:25,905 INFO plugin.PluginRepository - Nutch Query Filter (org.apache.nutch.searcher.QueryFilter)2007-06-12 12:23:25,905 INFO plugin.PluginRepository - Ontology Model Loader (org.apache.nutch.ontology.Ontology)2007-06-12 12:23:25,990 WARN regex.RegexURLNormalizer - can't find rules for scope 'outlink', using default

this is the last message before the process uses 100% of the system resources. It never exits or gives any other errors.

I am using the local file system on a single machine without map- reduce. I have tried several configurations including JDK5 and JDK 6 with the same error. I have had success crawling a different list of urls with the exact same settings on the same machine.

~Jason


Jason Stubblefield
[EMAIL PROTECTED]

Please enjoy one of my web properties:

http://www.geothingy.com/
http://www.fivemushrooms.com/
http://www.wikitourist.com/


Reply via email to