Hello everyone. 
I crawled Nutch many pages using Java 0.9, under Windows environment CGYWin.
I am interested in getting the html content of each page. (I do not want to
see on the screen pages.) For each page I have to create a document with
Lucene. A document will have multiple fields and one of them is the content
of the page.
The way to get the content (as I found on Google) is this:

import org.apache.nutch.searcher.Hit;
import org.apache.nutch.searcher.HitDetails;
import org.apache.nutch.searcher.Hits;
import org.apache.nutch.searcher.NutchBean;
import org.apache.nutch.searcher.Query;
import org.apache.nutch.util.NutchConfiguration;
import org.apache.hadoop.conf.Configuration;
import org.apache.nutch.parse.ParseText;
import org.apache.hadoop.fs.Path;


 Configuration conf = NutchConfiguration.create();

 NutchBean nb = new NutchBean(conf);

 Hits hits = nb.search(Query.parse("i*", conf), 10);

 if( null != hits)
        {
            Hit hit = hits.getHit(5);
            HitDetails hitDetails = nb.getDetails(hit);
            ParseText pText = nb.getParseText(hitDetails);

            System.out.println(pText.getText());
        }




However, it is this trace of error: 

12/10/2008 13:21:08 searcher.NutchBean INFO: opening indexes in crawl /
indexes 12/10/2008 13:21 : 08 WARN plugin.PluginRepository: Plugins:
directory not found: plugins plugin.PluginRepository 12/10/2008 13:21:08
INFO: Plugin Auto-activation mode: [true] INFO plugin 12.10.2008 13:21:08 .
PluginRepository: Registered Plugins: 12/10/2008 13:21:08
plugin.PluginRepository INFO: 10/12/2008 13:21:08 NONE
plugin.PluginRepository INFO: Registered Extension-Points: 10.12.1908 13:21
: 08 INFO plugin.PluginRepository: NONE java.lang.RuntimeException:
org.apache.nutch.searcher.QueryFilter not found. at
org.apache.nutch.searcher.QueryFilters. (QueryFilters.java: 60) at
org.apache.nutch.searcher.IndexSearcher.init (IndexSearcher.java: 79) at
org.apache.nutch.searcher.IndexSearcher. (IndexSearcher . java: 63) at
org.apache.nutch.searcher.NutchBean.init (NutchBean.java: 140) at
org.apache.nutch.searcher.NutchBean. (NutchBean.java: 106) at
org.apache.nutch.searcher . NutchBean. (NutchBean.java: 84) at
parsenutchcontent.Main.main (Main.java: 29)

 anyone have any idea how I can fix this error? I clarify that I do not want
to see pages with graphical interface Nutch. And I need to create a Document
Library, which is the input for a page-ranking algorithm. This will be the
Bayes algorithm api Mallets. 

Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/java-lang-RuntimeException-org-apache-nutch-searcher-QueryFilter-not-found-tp2053215p2053215.html
Sent from the Nutch - Dev mailing list archive at Nabble.com.

Reply via email to