I am trying to set up Nutch with an intranet. I used Nutch 0.7 with Java J2SE 1.4.2 and Tomcat 4.1.31.
I did the crawl with the command bin/nutch crawl bin/urls.txt -dir crawl.test -depth 3 >& crawl.log and the crawl.log gave log messages that appeared to imply that it was a successful run. (Crawl.log is copied after the Java/JSP errors below) and I set JAVA_HOME and NUTCH_JAVA_HOME to the J2re when I did the crawl, but I set JAVA_HOME to the j2se when I ran tomcat and i went to http://localhost:8080 I tried to search something and I got this error of the Nutch Bean. Did I configure something wrong? How can I fix this? Diane Palla Web Services Developer Seton Hall University 973 313-6199 [EMAIL PROTECTED] org.apache.jasper.JasperException at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:207) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:240) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:187) at javax.servlet.http.HttpServlet.service(HttpServlet.java:809) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:200) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:146) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:209) at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:596) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:433) at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:948) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:144) at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:596) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:433) at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:948) at org.apache.catalina.core.StandardContext.invoke(StandardContext.java:2358) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:133) at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:596) at org.apache.catalina.valves.ErrorDispatcherValve.invoke(ErrorDispatcherValve.java:118) at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:594) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:116) at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:594) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:433) at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:948) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:127) at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:596) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:433) at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:948) at org.apache.coyote.tomcat4.CoyoteAdapter.service(CoyoteAdapter.java:152) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:799) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.processConnection(Http11Protocol.java:705) at org.apache.tomcat.util.net.TcpWorkerThread.runIt(PoolTcpEndpoint.java:577) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:683) at java.lang.Thread.run(Thread.java:534) root cause java.lang.NullPointerException at org.apache.nutch.searcher.NutchBean.init(NutchBean.java:96) at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:82) at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:72) at org.apache.nutch.searcher.NutchBean.get(NutchBean.java:64) at org.apache.jsp.search_jsp._jspService(search_jsp.java:108) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:92) at javax.servlet.http.HttpServlet.service(HttpServlet.java:809) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:162) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:240) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:187) at javax.servlet.http.HttpServlet.service(HttpServlet.java:809) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:200) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:146) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:209) at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:596) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:433) at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:948) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:144) at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:596) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:433) at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:948) at org.apache.catalina.core.StandardContext.invoke(StandardContext.java:2358) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:133) at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:596) at org.apache.catalina.valves.ErrorDispatcherValve.invoke(ErrorDispatcherValve.java:118) at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:594) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:116) at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:594) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:433) at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:948) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:127) at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:596) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:433) at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:948) at org.apache.coyote.tomcat4.CoyoteAdapter.service(CoyoteAdapter.java:152) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:799) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.processConnection(Http11Protocol.java:705) at org.apache.tomcat.util.net.TcpWorkerThread.runIt(PoolTcpEndpoint.java:577) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:683) at java.lang.Thread.run(Thread.java:534) Crawl.log: run java in /usr/java/j2re1.4.2_02 050818 140148 parsing file:/gartner/httpd/html/nutch-0.7/conf/nutch-default.xml 050818 140149 parsing file:/gartner/httpd/html/nutch-0.7/conf/crawl-tool.xml 050818 140149 parsing file:/gartner/httpd/html/nutch-0.7/conf/nutch-site.xml 050818 140149 No FS indicated, using default:local 050818 140149 crawl started in: crawl.test 050818 140149 rootUrlFile = bin/urls.txt 050818 140149 threads = 10 050818 140149 depth = 3 050818 140149 Created webdb at LocalFS,/gartner/httpd/html/nutch-0.7/crawl.test/db 050818 140149 Starting URL processing 050818 140149 Plugins: looking in: /gartner/httpd/html/nutch-0.7/plugins 050818 140149 not including: /gartner/httpd/html/nutch-0.7/plugins/clustering-carrot2 050818 140149 not including: /gartner/httpd/html/nutch-0.7/plugins/creativecommons 050818 140149 parsing: /gartner/httpd/html/nutch-0.7/plugins/index-basic/plugin.xml 050818 140150 impl: point=org.apache.nutch.indexer.IndexingFilter class=org.apache.nutch.indexer.basic.BasicIndexingFilter 050818 140150 not including: /gartner/httpd/html/nutch-0.7/plugins/index-more 050818 140150 not including: /gartner/httpd/html/nutch-0.7/plugins/language-identifier 050818 140150 not including: /gartner/httpd/html/nutch-0.7/plugins/ontology 050818 140150 not including: /gartner/httpd/html/nutch-0.7/plugins/parse-ext 050818 140150 parsing: /gartner/httpd/html/nutch-0.7/plugins/parse-html/plugin.xml 050818 140150 impl: point=org.apache.nutch.parse.Parser class=org.apache.nutch.parse.html.HtmlParser 050818 140150 parsing: /gartner/httpd/html/nutch-0.7/plugins/parse-js/plugin.xml 050818 140150 impl: point=org.apache.nutch.parse.Parser class=org.apache.nutch.parse.js.JSParseFilter 050818 140150 impl: point=org.apache.nutch.parse.HtmlParseFilter class=org.apache.nutch.parse.js.JSParseFilter 050818 140150 not including: /gartner/httpd/html/nutch-0.7/plugins/parse-msword 050818 140150 not including: /gartner/httpd/html/nutch-0.7/plugins/parse-pdf 050818 140150 not including: /gartner/httpd/html/nutch-0.7/plugins/parse-rss 050818 140150 parsing: /gartner/httpd/html/nutch-0.7/plugins/parse-text/plugin.xml 050818 140150 impl: point=org.apache.nutch.parse.Parser class=org.apache.nutch.parse.text.TextParser 050818 140150 not including: /gartner/httpd/html/nutch-0.7/plugins/protocol-file 050818 140150 not including: /gartner/httpd/html/nutch-0.7/plugins/protocol-ftp 050818 140150 not including: /gartner/httpd/html/nutch-0.7/plugins/protocol-http 050818 140150 parsing: /gartner/httpd/html/nutch-0.7/plugins/protocol-httpclient/plugin.xml 050818 140150 impl: point=org.apache.nutch.protocol.Protocol class=org.apache.nutch.protocol.httpclient.Http 050818 140150 impl: point=org.apache.nutch.protocol.Protocol class=org.apache.nutch.protocol.httpclient.Http 050818 140150 parsing: /gartner/httpd/html/nutch-0.7/plugins/query-basic/plugin.xml 050818 140150 impl: point=org.apache.nutch.searcher.QueryFilter class=org.apache.nutch.searcher.basic.BasicQueryFilter 050818 140150 not including: /gartner/httpd/html/nutch-0.7/plugins/query-more 050818 140150 parsing: /gartner/httpd/html/nutch-0.7/plugins/query-site/plugin.xml 050818 140150 impl: point=org.apache.nutch.searcher.QueryFilter class=org.apache.nutch.searcher.site.SiteQueryFilter 050818 140150 parsing: /gartner/httpd/html/nutch-0.7/plugins/query-url/plugin.xml 050818 140150 impl: point=org.apache.nutch.searcher.QueryFilter class=org.apache.nutch.searcher.url.URLQueryFilter 050818 140150 not including: /gartner/httpd/html/nutch-0.7/plugins/urlfilter-prefix 050818 140150 parsing: /gartner/httpd/html/nutch-0.7/plugins/urlfilter-regex/plugin.xml 050818 140150 impl: point=org.apache.nutch.net.URLFilter class=org.apache.nutch.net.RegexURLFilter 050818 140150 found resource crawl-urlfilter.txt at file:/gartner/httpd/html/nutch-0.7/conf/crawl-urlfilter.txt 050818 140150 Using URL normalizer: org.apache.nutch.net.BasicUrlNormalizer 050818 140150 Added 1 pages 050818 140150 Processing pagesByURL: Sorted 1 instructions in 0.014 seconds. 050818 140150 Processing pagesByURL: Sorted 71.42857142857143 instructions/second 050818 140150 Processing pagesByURL: Merged to new DB containing 1 records in 0.0070 seconds 050818 140150 Processing pagesByURL: Merged 142.85714285714286 records/second 050818 140150 Processing pagesByMD5: Sorted 1 instructions in 0.0020 seconds. 050818 140150 Processing pagesByMD5: Sorted 500.0 instructions/second 050818 140150 Processing pagesByMD5: Merged to new DB containing 1 records in 0.0030 seconds 050818 140150 Processing pagesByMD5: Merged 333.3333333333333 records/second 050818 140150 Processing linksByMD5: Copied file (4096 bytes) in 0.01 secs. 050818 140150 Processing linksByURL: Copied file (4096 bytes) in -0.0020 secs. 050818 140150 FetchListTool started 050818 140151 Processing pagesByURL: Sorted 1 instructions in 0.106 seconds. 050818 140151 Processing pagesByURL: Sorted 9.433962264150944 instructions/second 050818 140151 Processing pagesByURL: Merged to new DB containing 1 records in 0.0 seconds 050818 140151 Processing pagesByURL: Merged Infinity records/second 050818 140151 Processing pagesByMD5: Sorted 1 instructions in 0.0020 seconds. 050818 140151 Processing pagesByMD5: Sorted 500.0 instructions/second 050818 140151 Processing pagesByMD5: Merged to new DB containing 1 records in 0.0020 seconds 050818 140151 Processing pagesByMD5: Merged 500.0 records/second 050818 140151 Processing linksByMD5: Copied file (4096 bytes) in 0.0010 secs. 050818 140151 Processing linksByURL: Copied file (4096 bytes) in 0.0020 secs. 050818 140151 Processing /gartner/httpd/html/nutch-0.7/crawl.test/segments/20050818140150/fetchlist.unsorted: Sorted 1 entries in 0.011 seconds. 050818 140151 Processing /gartner/httpd/html/nutch-0.7/crawl.test/segments/20050818140150/fetchlist.unsorted: Sorted 90.90909090909092 entries/second 050818 140151 Overall processing: Sorted 1 entries in 0.011 seconds. 050818 140151 Overall processing: Sorted 0.011 entries/second 050818 140151 FetchListTool completed 050818 140151 logging at INFO 050818 140151 fetching http://gartner.shu.edu/ 050818 140151 http.proxy.host = null 050818 140151 http.proxy.port = 8080 050818 140151 http.timeout = 10000 050818 140151 http.content.limit = 65536 050818 140151 http.agent = NutchCVS/0.7 (Nutch; http://lucene.apache.org/nutch/bot.html; [email protected]) 050818 140151 http.auth.ntlm.username = 050818 140151 fetcher.server.delay = 1000 050818 140151 http.max.delays = 100 050818 140152 Configured Client 050818 140152 basic authentication scheme selected 050818 140152 basic authentication scheme selected 050818 140153 Updating /gartner/httpd/html/nutch-0.7/crawl.test/db 050818 140154 Updating for /gartner/httpd/html/nutch-0.7/crawl.test/segments/20050818140150 050818 140154 Processing document 0 050818 140154 Finishing update 050818 140154 Processing pagesByURL: Sorted 1 instructions in 0.0060 seconds. 050818 140154 Processing pagesByURL: Sorted 166.66666666666666 instructions/second 050818 140154 Processing pagesByURL: Merged to new DB containing 1 records in 0.0010 seconds 050818 140154 Processing pagesByURL: Merged 1000.0 records/second 050818 140154 Processing pagesByMD5: Sorted 1 instructions in 0.0050 seconds. 050818 140154 Processing pagesByMD5: Sorted 200.0 instructions/second 050818 140154 Processing pagesByMD5: Merged to new DB containing 1 records in 0.0 seconds 050818 140154 Processing pagesByMD5: Merged Infinity records/second 050818 140154 Processing linksByMD5: Copied file (4096 bytes) in 0.0020 secs. 050818 140154 Processing linksByURL: Copied file (4096 bytes) in 0.0040 secs. 050818 140154 Update finished 050818 140154 FetchListTool started 050818 140154 Overall processing: Sorted 0 entries in 0.0 seconds. 050818 140154 Overall processing: Sorted NaN entries/second 050818 140154 FetchListTool completed 050818 140154 logging at INFO 050818 140155 Updating /gartner/httpd/html/nutch-0.7/crawl.test/db 050818 140155 Updating for /gartner/httpd/html/nutch-0.7/crawl.test/segments/20050818140154 050818 140155 Finishing update 050818 140155 Update finished 050818 140155 FetchListTool started 050818 140156 Overall processing: Sorted 0 entries in 0.0 seconds. 050818 140156 Overall processing: Sorted NaN entries/second 050818 140156 FetchListTool completed 050818 140156 logging at INFO 050818 140157 Updating /gartner/httpd/html/nutch-0.7/crawl.test/db 050818 140157 Updating for /gartner/httpd/html/nutch-0.7/crawl.test/segments/20050818140156 050818 140157 Finishing update 050818 140157 Update finished 050818 140157 Updating /gartner/httpd/html/nutch-0.7/crawl.test/segments from /gartner/httpd/html/nutch-0.7/crawl.test/db 050818 140157 reading /gartner/httpd/html/nutch-0.7/crawl.test/segments/20050818140150 050818 140157 reading /gartner/httpd/html/nutch-0.7/crawl.test/segments/20050818140154 050818 140157 reading /gartner/httpd/html/nutch-0.7/crawl.test/segments/20050818140156 050818 140157 Sorting pages by url... 050818 140157 Getting updated scores and anchors from db... 050818 140157 Sorting updates by segment... 050818 140157 Updating segments... 050818 140157 updating /gartner/httpd/html/nutch-0.7/crawl.test/segments/20050818140150 050818 140157 Done updating /gartner/httpd/html/nutch-0.7/crawl.test/segments from /gartner/httpd/html/nutch-0.7/crawl.test/db 050818 140158 indexing segment: /gartner/httpd/html/nutch-0.7/crawl.test/segments/20050818140150 050818 140158 * Opening segment 20050818140150 050818 140158 * Indexing segment 20050818140150 050818 140158 * Optimizing index... 050818 140158 * Moving index to NFS if needed... 050818 140158 DONE indexing segment 20050818140150: total 1 records in 0.034 s (Infinity rec/s). 050818 140158 done indexing 050818 140158 indexing segment: /gartner/httpd/html/nutch-0.7/crawl.test/segments/20050818140154 050818 140158 * Opening segment 20050818140154 050818 140158 * Indexing segment 20050818140154 050818 140158 * Optimizing index... 050818 140158 * Moving index to NFS if needed... 050818 140158 DONE indexing segment 20050818140154: total 0 records in 0.046 s (NaN rec/s). 050818 140158 done indexing 050818 140158 indexing segment: /gartner/httpd/html/nutch-0.7/crawl.test/segments/20050818140156 050818 140158 * Opening segment 20050818140156 050818 140158 * Indexing segment 20050818140156 050818 140158 * Optimizing index... 050818 140158 * Moving index to NFS if needed... 050818 140158 DONE indexing segment 20050818140156: total 0 records in 0.071 s (NaN rec/s). 050818 140158 done indexing 050818 140158 Reading url hashes... 050818 140158 Sorting url hashes... 050818 140158 Deleting url duplicates... 050818 140158 Deleted 0 url duplicates. 050818 140158 Reading content hashes... 050818 140158 Sorting content hashes... 050818 140158 Deleting content duplicates... 050818 140158 Deleted 0 content duplicates. 050818 140158 Duplicate deletion complete locally. Now returning to NFS... 050818 140158 DeleteDuplicates complete 050818 140158 Merging segment indexes... 050818 140158 crawl finished: crawl.test
