I am trying to set up Nutch with an intranet.  I used Nutch 0.7 with Java 
J2SE 1.4.2 and Tomcat 4.1.31.

I did the crawl with the command

bin/nutch crawl bin/urls.txt -dir crawl.test -depth 3 >& crawl.log


and the crawl.log gave log messages that appeared to imply that it was a 
successful run.  (Crawl.log is copied after the Java/JSP errors below)

and I set JAVA_HOME and NUTCH_JAVA_HOME to the J2re when I did the crawl, 
but I set JAVA_HOME to the j2se when I ran tomcat and i went to 
http://localhost:8080

I tried to search something and

I got this error of the Nutch Bean.

Did I configure something wrong?  How can I fix this?


Diane Palla
Web Services Developer
Seton Hall University
973 313-6199
[EMAIL PROTECTED]



org.apache.jasper.JasperException
                 at 
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:207)
                 at 
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:240)
                 at 
org.apache.jasper.servlet.JspServlet.service(JspServlet.java:187)
                 at 
javax.servlet.http.HttpServlet.service(HttpServlet.java:809)
                 at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:200)
                 at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:146)
                 at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:209)
                 at 
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:596)
                 at 
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:433)
                 at 
org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:948)
                 at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:144)
                 at 
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:596)
                 at 
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:433)
                 at 
org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:948)
                 at 
org.apache.catalina.core.StandardContext.invoke(StandardContext.java:2358)
                 at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:133)
                 at 
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:596)
                 at 
org.apache.catalina.valves.ErrorDispatcherValve.invoke(ErrorDispatcherValve.java:118)
                 at 
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:594)
                 at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:116)
                 at 
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:594)
                 at 
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:433)
                 at 
org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:948)
                 at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:127)
                 at 
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:596)
                 at 
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:433)
                 at 
org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:948)
                 at 
org.apache.coyote.tomcat4.CoyoteAdapter.service(CoyoteAdapter.java:152)
                 at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:799)
                 at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.processConnection(Http11Protocol.java:705)
                 at 
org.apache.tomcat.util.net.TcpWorkerThread.runIt(PoolTcpEndpoint.java:577)
                 at 
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:683)
                 at java.lang.Thread.run(Thread.java:534)

root cause 
java.lang.NullPointerException
                 at 
org.apache.nutch.searcher.NutchBean.init(NutchBean.java:96)
                 at 
org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:82)
                 at 
org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:72)
                 at 
org.apache.nutch.searcher.NutchBean.get(NutchBean.java:64)
                 at 
org.apache.jsp.search_jsp._jspService(search_jsp.java:108)
                 at 
org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:92)
                 at 
javax.servlet.http.HttpServlet.service(HttpServlet.java:809)
                 at 
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:162)
                 at 
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:240)
                 at 
org.apache.jasper.servlet.JspServlet.service(JspServlet.java:187)
                 at 
javax.servlet.http.HttpServlet.service(HttpServlet.java:809)
                 at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:200)
                 at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:146)
                 at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:209)
                 at 
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:596)
                 at 
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:433)
                 at 
org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:948)
                 at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:144)
                 at 
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:596)
                 at 
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:433)
                 at 
org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:948)
                 at 
org.apache.catalina.core.StandardContext.invoke(StandardContext.java:2358)
                 at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:133)
                 at 
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:596)
                 at 
org.apache.catalina.valves.ErrorDispatcherValve.invoke(ErrorDispatcherValve.java:118)
                 at 
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:594)
                 at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:116)
                 at 
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:594)
                 at 
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:433)
                 at 
org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:948)
                 at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:127)
                 at 
org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:596)
                 at 
org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:433)
                 at 
org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:948)
                 at 
org.apache.coyote.tomcat4.CoyoteAdapter.service(CoyoteAdapter.java:152)
                 at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:799)
                 at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.processConnection(Http11Protocol.java:705)
                 at 
org.apache.tomcat.util.net.TcpWorkerThread.runIt(PoolTcpEndpoint.java:577)
                 at 
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:683)
                 at java.lang.Thread.run(Thread.java:534)



Crawl.log:

run java in /usr/java/j2re1.4.2_02
050818 140148 parsing 
file:/gartner/httpd/html/nutch-0.7/conf/nutch-default.xml
050818 140149 parsing 
file:/gartner/httpd/html/nutch-0.7/conf/crawl-tool.xml
050818 140149 parsing 
file:/gartner/httpd/html/nutch-0.7/conf/nutch-site.xml
050818 140149 No FS indicated, using default:local
050818 140149 crawl started in: crawl.test
050818 140149 rootUrlFile = bin/urls.txt
050818 140149 threads = 10
050818 140149 depth = 3
050818 140149 Created webdb at 
LocalFS,/gartner/httpd/html/nutch-0.7/crawl.test/db
050818 140149 Starting URL processing
050818 140149 Plugins: looking in: /gartner/httpd/html/nutch-0.7/plugins
050818 140149 not including: 
/gartner/httpd/html/nutch-0.7/plugins/clustering-carrot2
050818 140149 not including: 
/gartner/httpd/html/nutch-0.7/plugins/creativecommons
050818 140149 parsing: 
/gartner/httpd/html/nutch-0.7/plugins/index-basic/plugin.xml
050818 140150 impl: point=org.apache.nutch.indexer.IndexingFilter 
class=org.apache.nutch.indexer.basic.BasicIndexingFilter
050818 140150 not including: 
/gartner/httpd/html/nutch-0.7/plugins/index-more
050818 140150 not including: 
/gartner/httpd/html/nutch-0.7/plugins/language-identifier
050818 140150 not including: 
/gartner/httpd/html/nutch-0.7/plugins/ontology
050818 140150 not including: 
/gartner/httpd/html/nutch-0.7/plugins/parse-ext
050818 140150 parsing: 
/gartner/httpd/html/nutch-0.7/plugins/parse-html/plugin.xml
050818 140150 impl: point=org.apache.nutch.parse.Parser 
class=org.apache.nutch.parse.html.HtmlParser
050818 140150 parsing: 
/gartner/httpd/html/nutch-0.7/plugins/parse-js/plugin.xml
050818 140150 impl: point=org.apache.nutch.parse.Parser 
class=org.apache.nutch.parse.js.JSParseFilter
050818 140150 impl: point=org.apache.nutch.parse.HtmlParseFilter 
class=org.apache.nutch.parse.js.JSParseFilter
050818 140150 not including: 
/gartner/httpd/html/nutch-0.7/plugins/parse-msword
050818 140150 not including: 
/gartner/httpd/html/nutch-0.7/plugins/parse-pdf
050818 140150 not including: 
/gartner/httpd/html/nutch-0.7/plugins/parse-rss
050818 140150 parsing: 
/gartner/httpd/html/nutch-0.7/plugins/parse-text/plugin.xml
050818 140150 impl: point=org.apache.nutch.parse.Parser 
class=org.apache.nutch.parse.text.TextParser
050818 140150 not including: 
/gartner/httpd/html/nutch-0.7/plugins/protocol-file
050818 140150 not including: 
/gartner/httpd/html/nutch-0.7/plugins/protocol-ftp
050818 140150 not including: 
/gartner/httpd/html/nutch-0.7/plugins/protocol-http
050818 140150 parsing: 
/gartner/httpd/html/nutch-0.7/plugins/protocol-httpclient/plugin.xml
050818 140150 impl: point=org.apache.nutch.protocol.Protocol 
class=org.apache.nutch.protocol.httpclient.Http
050818 140150 impl: point=org.apache.nutch.protocol.Protocol 
class=org.apache.nutch.protocol.httpclient.Http
050818 140150 parsing: 
/gartner/httpd/html/nutch-0.7/plugins/query-basic/plugin.xml
050818 140150 impl: point=org.apache.nutch.searcher.QueryFilter 
class=org.apache.nutch.searcher.basic.BasicQueryFilter
050818 140150 not including: 
/gartner/httpd/html/nutch-0.7/plugins/query-more
050818 140150 parsing: 
/gartner/httpd/html/nutch-0.7/plugins/query-site/plugin.xml
050818 140150 impl: point=org.apache.nutch.searcher.QueryFilter 
class=org.apache.nutch.searcher.site.SiteQueryFilter
050818 140150 parsing: 
/gartner/httpd/html/nutch-0.7/plugins/query-url/plugin.xml
050818 140150 impl: point=org.apache.nutch.searcher.QueryFilter 
class=org.apache.nutch.searcher.url.URLQueryFilter
050818 140150 not including: 
/gartner/httpd/html/nutch-0.7/plugins/urlfilter-prefix
050818 140150 parsing: 
/gartner/httpd/html/nutch-0.7/plugins/urlfilter-regex/plugin.xml
050818 140150 impl: point=org.apache.nutch.net.URLFilter 
class=org.apache.nutch.net.RegexURLFilter
050818 140150 found resource crawl-urlfilter.txt at 
file:/gartner/httpd/html/nutch-0.7/conf/crawl-urlfilter.txt
050818 140150 Using URL normalizer: 
org.apache.nutch.net.BasicUrlNormalizer
050818 140150 Added 1 pages
050818 140150 Processing pagesByURL: Sorted 1 instructions in 0.014 
seconds.
050818 140150 Processing pagesByURL: Sorted 71.42857142857143 
instructions/second
050818 140150 Processing pagesByURL: Merged to new DB containing 1 records 
in 0.0070 seconds
050818 140150 Processing pagesByURL: Merged 142.85714285714286 
records/second
050818 140150 Processing pagesByMD5: Sorted 1 instructions in 0.0020 
seconds.
050818 140150 Processing pagesByMD5: Sorted 500.0 instructions/second
050818 140150 Processing pagesByMD5: Merged to new DB containing 1 records 
in 0.0030 seconds
050818 140150 Processing pagesByMD5: Merged 333.3333333333333 
records/second
050818 140150 Processing linksByMD5: Copied file (4096 bytes) in 0.01 
secs.
050818 140150 Processing linksByURL: Copied file (4096 bytes) in -0.0020 
secs.
050818 140150 FetchListTool started
050818 140151 Processing pagesByURL: Sorted 1 instructions in 0.106 
seconds.
050818 140151 Processing pagesByURL: Sorted 9.433962264150944 
instructions/second
050818 140151 Processing pagesByURL: Merged to new DB containing 1 records 
in 0.0 seconds
050818 140151 Processing pagesByURL: Merged Infinity records/second
050818 140151 Processing pagesByMD5: Sorted 1 instructions in 0.0020 
seconds.
050818 140151 Processing pagesByMD5: Sorted 500.0 instructions/second
050818 140151 Processing pagesByMD5: Merged to new DB containing 1 records 
in 0.0020 seconds
050818 140151 Processing pagesByMD5: Merged 500.0 records/second
050818 140151 Processing linksByMD5: Copied file (4096 bytes) in 0.0010 
secs.
050818 140151 Processing linksByURL: Copied file (4096 bytes) in 0.0020 
secs.
050818 140151 Processing 
/gartner/httpd/html/nutch-0.7/crawl.test/segments/20050818140150/fetchlist.unsorted:
 
Sorted 1 entries in 0.011 seconds.
050818 140151 Processing 
/gartner/httpd/html/nutch-0.7/crawl.test/segments/20050818140150/fetchlist.unsorted:
 
Sorted 90.90909090909092 entries/second
050818 140151 Overall processing: Sorted 1 entries in 0.011 seconds.
050818 140151 Overall processing: Sorted 0.011 entries/second
050818 140151 FetchListTool completed
050818 140151 logging at INFO
050818 140151 fetching http://gartner.shu.edu/
050818 140151 http.proxy.host = null
050818 140151 http.proxy.port = 8080
050818 140151 http.timeout = 10000
050818 140151 http.content.limit = 65536
050818 140151 http.agent = NutchCVS/0.7 (Nutch; 
http://lucene.apache.org/nutch/bot.html; [email protected])
050818 140151 http.auth.ntlm.username = 
050818 140151 fetcher.server.delay = 1000
050818 140151 http.max.delays = 100
050818 140152 Configured Client
050818 140152 basic authentication scheme selected
050818 140152 basic authentication scheme selected
050818 140153 Updating /gartner/httpd/html/nutch-0.7/crawl.test/db
050818 140154 Updating for 
/gartner/httpd/html/nutch-0.7/crawl.test/segments/20050818140150
050818 140154 Processing document 0
050818 140154 Finishing update
050818 140154 Processing pagesByURL: Sorted 1 instructions in 0.0060 
seconds.
050818 140154 Processing pagesByURL: Sorted 166.66666666666666 
instructions/second
050818 140154 Processing pagesByURL: Merged to new DB containing 1 records 
in 0.0010 seconds
050818 140154 Processing pagesByURL: Merged 1000.0 records/second
050818 140154 Processing pagesByMD5: Sorted 1 instructions in 0.0050 
seconds.
050818 140154 Processing pagesByMD5: Sorted 200.0 instructions/second
050818 140154 Processing pagesByMD5: Merged to new DB containing 1 records 
in 0.0 seconds
050818 140154 Processing pagesByMD5: Merged Infinity records/second
050818 140154 Processing linksByMD5: Copied file (4096 bytes) in 0.0020 
secs.
050818 140154 Processing linksByURL: Copied file (4096 bytes) in 0.0040 
secs.
050818 140154 Update finished
050818 140154 FetchListTool started
050818 140154 Overall processing: Sorted 0 entries in 0.0 seconds.
050818 140154 Overall processing: Sorted NaN entries/second
050818 140154 FetchListTool completed
050818 140154 logging at INFO
050818 140155 Updating /gartner/httpd/html/nutch-0.7/crawl.test/db
050818 140155 Updating for 
/gartner/httpd/html/nutch-0.7/crawl.test/segments/20050818140154
050818 140155 Finishing update
050818 140155 Update finished
050818 140155 FetchListTool started
050818 140156 Overall processing: Sorted 0 entries in 0.0 seconds.
050818 140156 Overall processing: Sorted NaN entries/second
050818 140156 FetchListTool completed
050818 140156 logging at INFO
050818 140157 Updating /gartner/httpd/html/nutch-0.7/crawl.test/db
050818 140157 Updating for 
/gartner/httpd/html/nutch-0.7/crawl.test/segments/20050818140156
050818 140157 Finishing update
050818 140157 Update finished
050818 140157 Updating /gartner/httpd/html/nutch-0.7/crawl.test/segments 
from /gartner/httpd/html/nutch-0.7/crawl.test/db
050818 140157  reading 
/gartner/httpd/html/nutch-0.7/crawl.test/segments/20050818140150
050818 140157  reading 
/gartner/httpd/html/nutch-0.7/crawl.test/segments/20050818140154
050818 140157  reading 
/gartner/httpd/html/nutch-0.7/crawl.test/segments/20050818140156
050818 140157 Sorting pages by url...
050818 140157 Getting updated scores and anchors from db...
050818 140157 Sorting updates by segment...
050818 140157 Updating segments...
050818 140157  updating 
/gartner/httpd/html/nutch-0.7/crawl.test/segments/20050818140150
050818 140157 Done updating 
/gartner/httpd/html/nutch-0.7/crawl.test/segments from 
/gartner/httpd/html/nutch-0.7/crawl.test/db
050818 140158 indexing segment: 
/gartner/httpd/html/nutch-0.7/crawl.test/segments/20050818140150
050818 140158 * Opening segment 20050818140150
050818 140158 * Indexing segment 20050818140150
050818 140158 * Optimizing index...
050818 140158 * Moving index to NFS if needed...
050818 140158 DONE indexing segment 20050818140150: total 1 records in 
0.034 s (Infinity rec/s).
050818 140158 done indexing
050818 140158 indexing segment: 
/gartner/httpd/html/nutch-0.7/crawl.test/segments/20050818140154
050818 140158 * Opening segment 20050818140154
050818 140158 * Indexing segment 20050818140154
050818 140158 * Optimizing index...
050818 140158 * Moving index to NFS if needed...
050818 140158 DONE indexing segment 20050818140154: total 0 records in 
0.046 s (NaN rec/s).
050818 140158 done indexing
050818 140158 indexing segment: 
/gartner/httpd/html/nutch-0.7/crawl.test/segments/20050818140156
050818 140158 * Opening segment 20050818140156
050818 140158 * Indexing segment 20050818140156
050818 140158 * Optimizing index...
050818 140158 * Moving index to NFS if needed...
050818 140158 DONE indexing segment 20050818140156: total 0 records in 
0.071 s (NaN rec/s).
050818 140158 done indexing
050818 140158 Reading url hashes...
050818 140158 Sorting url hashes...
050818 140158 Deleting url duplicates...
050818 140158 Deleted 0 url duplicates.
050818 140158 Reading content hashes...
050818 140158 Sorting content hashes...
050818 140158 Deleting content duplicates...
050818 140158 Deleted 0 content duplicates.
050818 140158 Duplicate deletion complete locally.  Now returning to 
NFS...
050818 140158 DeleteDuplicates complete
050818 140158 Merging segment indexes... 
050818 140158 crawl finished: crawl.test

Reply via email to