I spent about 30 minutes trying to figure out how to submit a bug via JIRA.
There must be a way, but it's not shown on any of the JIRA pages I clicked on.
Anyway, here's the bug report:
Component: indexer
Priority: major
After running for several hours on the intranet, the Nutch indexer crashed.
The crawling was started as described in 'Intranet crawling' in the Nutch
tutorial. This has only happened once out of about 10 crawls.
...Output from intranet crawling for about 1 hour
050424 183642 Updating C:\Temp\crawl.temp\db
050424 183642 Updating for C:\Temp\crawl.temp\segments\20050424183635
050424 183642 Processing document 0
050424 183642 Finishing update
050424 183642 Processing pagesByURL: Sorted 69 instructions in 0.0 seconds.
050424 183642 Processing pagesByURL: Sorted Infinity instructions/second
java.io.IOException: already exists: C:\Temp\crawl.temp\db\webdb.new\pagesByURL
at net.nutch.io.MapFile$Writer.<init>(MapFile.java:67)
at
net.nutch.db.WebDBWriter$CloseProcessor.closeDown(WebDBWriter.java:536)
at net.nutch.db.WebDBWriter.close(WebDBWriter.java:1531)
at net.nutch.tools.UpdateDatabaseTool.close(UpdateDatabaseTool.java:301)
at net.nutch.tools.UpdateDatabaseTool.main(UpdateDatabaseTool.java:351)
at net.nutch.tools.CrawlTool.main(CrawlTool.java:128)
Exception in thread "main"
At this point the intranet crawler stopped (prematurely).
--
_______________________________________________
NEW! Lycos Dating Search. The only place to search multiple dating sites at
once.
http://datingsearch.lycos.com