I'm creating a Lucene index using an XSP based on the sample, but I have a strange 
problem.

Some of the pages are crawled, but some are not crawled, and I can't see why. 

I have DEBUG logging for the core.search components, so I can see the crawler crawling 
the site. I can see it read the links for each page, and I can see that it doesn't 
exclude any of the links. Yet it doesn't actually follow those links - the crawl 
simply comes to an end at some point, with some of the links uncrawled.

It seems to me that for every log entry from SimpleCocoonCrawlerImpl that says "Add 
URL: http://blah..."; I should also have an entry from SimpleLuceneXMLIndexerImpl that 
says "Indexing http://blah...";

The home page is crawled, and all of the pages off that page, and SOME of the pages 
off those pages, and SOME of the pages off THOSE pages. I can't see why some pages are 
crawled and others not. Perhaps the crawler simply stops at some point, and it hasn't 
finished its list of URLs. But why would it stop crawling without logging any error? 
BTW, the last entry in the log is always the SimpleLuceneXMLIndexerImpl reporting that 
it has indexed a page, e.g: 

DEBUG   (2003-06-09) 17:32.05:388   [core.search.lucene] (/search/reindex.xml) 
HttpProcessor[80][4]/SimpleLuceneXMLIndexerImpl: Indexing 
http://localhost:80/etexts/JCB-016/full.html?cocoon-view=content (text/xml)

Does anyone have any ideas where I could start looking?

I'm using the version RELEASE_2_1_M_2

Thanks

Con

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to