Hi, I am trying to setup nutch on Tomcat 5.5/Windows2000/jdk1.5.0_04/latest CYGWIN. I think I am about 99% of the way there, but I finally hit a stumbling block. I followed the instructions to a T, setup the war in the the root context, modified the config files, etc., set env NUTCH_JAVA_HOME, etc. I have 2 problems 1. The crawl doesn;t seem to be working. The crawled dir gets created, but see the log below. 0 records processed . My second problem is with the servlet (see 2. below). Thanks in advance for the help. crawl-urlfilter.txt -^(file|ftp|mailto): -\.(gif|GIF|jpg|JPG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|m ov|MOV|exe|png|PNG)$ [EMAIL PROTECTED] +^http://([a-z0-9]*\.)* irs.gov/ -. urls http://www.irs.gov/ Log: run java in C:\Program Files\Java\jdk1.5.0_04 060225 233931 parsing file:/T:/nutch-0.7.1/conf/nutch-default.xml 060225 233931 parsing file:/T:/nutch-0.7.1/conf/crawl-tool.xml 060225 233931 parsing file:/T:/nutch-0.7.1/conf/nutch-site.xml 060225 233931 No FS indicated, using default:local 060225 233931 crawl started in: crawled 060225 233931 rootUrlFile = urls 060225 233931 threads = 10 060225 233931 depth = 3 060225 233932 Created webdb at LocalFS,T:\nutch-0.7.1\crawled\db 060225 233932 Starting URL processing 060225 233932 Plugins: looking in: T:\nutch-0.7.1\plugins 060225 233932 not including: T:\nutch-0.7.1\plugins\clustering-carrot2 060225 233932 not including: T:\nutch-0.7.1\plugins\creativecommons 060225 233932 parsing: T:\nutch-0.7.1\plugins\index-basic\plugin.xml 060225 233932 impl: point=org.apache.nutch.indexer.IndexingFilter class=org.apache.nutch.indexer.basic.BasicIndexingFilter 060225 233932 not including: T:\nutch-0.7.1\plugins\index-more 060225 233932 not including: T:\nutch-0.7.1\plugins\language-identifier 060225 233932 parsing: T:\nutch-0.7.1\plugins\nutch-extensionpoints\plugin.xml 060225 233932 not including: T:\nutch-0.7.1\plugins\ontology 060225 233932 not including: T:\nutch-0.7.1\plugins\parse-ext 060225 233932 parsing: T:\nutch-0.7.1\plugins\parse-html\plugin.xml 060225 233932 impl: point=org.apache.nutch.parse.Parser class=org.apache.nutch.parse.html.HtmlParser 060225 233932 not including: T:\nutch-0.7.1\plugins\parse-js 060225 233932 not including: T:\nutch-0.7.1\plugins\parse-msword 060225 233932 not including: T:\nutch-0.7.1\plugins\parse-pdf 060225 233932 not including: T:\nutch-0.7.1\plugins\parse-rss 060225 233932 parsing: T:\nutch-0.7.1\plugins\parse-text\plugin.xml 060225 233932 impl: point=org.apache.nutch.parse.Parser class=org.apache.nutch.parse.text.TextParser 060225 233932 not including: T:\nutch-0.7.1\plugins\protocol-file 060225 233932 not including: T:\nutch-0.7.1\plugins\protocol-ftp 060225 233932 parsing: T:\nutch-0.7.1\plugins\protocol-http\plugin.xml 060225 233932 impl: point=org.apache.nutch.protocol.Protocol class=org.apache.nutch.protocol.http.Http 060225 233932 not including: T:\nutch-0.7.1\plugins\protocol-httpclient 060225 233932 parsing: T:\nutch-0.7.1\plugins\query-basic\plugin.xml 060225 233933 impl: point=org.apache.nutch.searcher.QueryFilter class=org.apache.nutch.searcher.basic.BasicQueryFilter 060225 233933 not including: T:\nutch-0.7.1\plugins\query-more 060225 233933 parsing: T:\nutch-0.7.1\plugins\query-site\plugin.xml 060225 233933 impl: point=org.apache.nutch.searcher.QueryFilter class=org.apache.nutch.searcher.site.SiteQueryFilter 060225 233933 parsing: T:\nutch-0.7.1\plugins\query-url\plugin.xml 060225 233933 impl: point=org.apache.nutch.searcher.QueryFilter class=org.apache.nutch.searcher.url.URLQueryFilter 060225 233933 not including: T:\nutch-0.7.1\plugins\urlfilter-prefix 060225 233933 parsing: T:\nutch-0.7.1\plugins\urlfilter-regex\plugin.xml 060225 233933 impl: point=org.apache.nutch.net.URLFilter class=org.apache.nutch.net.RegexURLFilter 060225 233933 found resource crawl-urlfilter.txt at file:/T:/nutch-0.7.1/conf/crawl-urlfilter.txt .060225 233933 Added 0 pages 060225 233933 FetchListTool started 060225 233933 Overall processing: Sorted 0 entries in 0.0 seconds. 060225 233933 Overall processing: Sorted NaN entries/second 060225 233933 FetchListTool completed 060225 233933 logging at INFO 060225 233934 Updating T:\nutch-0.7.1\crawled\db 060225 233934 Updating for T:\nutch-0.7.1\crawled\segments\20060225233933 060225 233934 Finishing update 060225 233934 Update finished 060225 233934 FetchListTool started 060225 233935 Overall processing: Sorted 0 entries in 0.0 seconds. 060225 233935 Overall processing: Sorted NaN entries/second 060225 233935 FetchListTool completed 060225 233935 logging at INFO 060225 233936 Updating T:\nutch-0.7.1\crawled\db 060225 233936 Updating for T:\nutch-0.7.1\crawled\segments\20060225233934 060225 233936 Finishing update 060225 233936 Update finished 060225 233936 FetchListTool started 060225 233936 Overall processing: Sorted 0 entries in 0.0 seconds. 060225 233936 Overall processing: Sorted NaN entries/second 060225 233936 FetchListTool completed 060225 233936 logging at INFO 060225 233937 Updating T:\nutch-0.7.1\crawled\db 060225 233938 Updating for T:\nutch-0.7.1\crawled\segments\20060225233936 060225 233938 Finishing update 060225 233938 Update finished 060225 233938 Updating T:\nutch-0.7.1\crawled\segments from T:\nutch-0.7.1\crawled\db 060225 233938 reading T:\nutch-0.7.1\crawled\segments\20060225233933 060225 233938 reading T:\nutch-0.7.1\crawled\segments\20060225233934 060225 233938 reading T:\nutch-0.7.1\crawled\segments\20060225233936 060225 233938 Sorting pages by url... 060225 233938 Getting updated scores and anchors from db... 060225 233938 Sorting updates by segment... 060225 233938 Updating segments... 060225 233938 Done updating T:\nutch-0.7.1\crawled\segments from T:\nutch-0.7.1\crawled\db 060225 233938 indexing segment: T:\nutch-0.7.1\crawled\segments\20060225233933 060225 233938 * Opening segment 20060225233933 060225 233938 * Indexing segment 20060225233933 060225 233938 * Optimizing index... 060225 233938 * Moving index to NFS if needed... 060225 233938 DONE indexing segment 20060225233933: total 0 records in 0.14 s (NaN rec/s). 060225 233938 done indexing 060225 233938 indexing segment: T:\nutch-0.7.1\crawled\segments\20060225233934 060225 233938 * Opening segment 20060225233934 060225 233938 * Indexing segment 20060225233934 060225 233938 * Optimizing index... 060225 233938 * Moving index to NFS if needed... 060225 233938 DONE indexing segment 20060225233934: total 0 records in 0.031 s (NaN rec/s). 060225 233938 done indexing 060225 233938 indexing segment: T:\nutch-0.7.1\crawled\segments\20060225233936 060225 233938 * Opening segment 20060225233936 060225 233938 * Indexing segment 20060225233936 060225 233938 * Optimizing index... 060225 233938 * Moving index to NFS if needed... 060225 233938 DONE indexing segment 20060225233936: total 0 records in 0.032 s (NaN rec/s). 060225 233938 done indexing 060225 233938 Reading url hashes... 060225 233938 Sorting url hashes... 060225 233938 Deleting url duplicates... 060225 233938 Deleted 0 url duplicates. 060225 233938 Reading content hashes... 060225 233938 Sorting content hashes... 060225 233938 Deleting content duplicates... 060225 233938 Deleted 0 content duplicates. 060225 233938 Duplicate deletion complete locally. Now returning to NFS... 060225 233938 DeleteDuplicates complete 060225 233938 Merging segment indexes... 060225 233938 crawl finished: crawled
2. Nutch seems to launch fine http://24.75.221.234:8080/ When you search you get the following error: Is this maybe because I haven;t completed a good crawl yet org.apache.jasper.JasperException org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.ja va:370) org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:291) org.apache.jasper.servlet.JspServlet.service(JspServlet.java:241) javax.servlet.http.HttpServlet.service(HttpServlet.java:802) root cause java.lang.NullPointerException org.apache.nutch.searcher.NutchBean.init(NutchBean.java:96) org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:82) org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:72) org.apache.nutch.searcher.NutchBean.get(NutchBean.java:64) org.apache.jsp.search_jsp._jspService(org.apache.jsp.search_jsp:112) org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97) javax.servlet.http.HttpServlet.service(HttpServlet.java:802) org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.ja va:322) org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:291) org.apache.jasper.servlet.JspServlet.service(JspServlet.java:241) javax.servlet.http.HttpServlet.service(HttpServlet.java:802) Richard Braman mailto:[EMAIL PROTECTED] 561.748.4002 (voice) http://www.taxcodesoftware.org <http://www.taxcodesoftware.org/> Free Open Source Tax Software coming soon: nutch.taxcodesoftware.org Open directory of tax software development.