I've been trying to figure out whether it's possible to use Zend_Search_Lucene in combination with Apache Nutch, which has a crawler and it can parse out a lot of formats like HTML, PDF etc. so it would be perfect for my case.
The docs say Zend_Search_Lucene supports Lucene index formats 1.9 to 2.0, and according to the change list for the latest Nutch version (0.9), Nutch uses Lucene 2.0.0, but for some reason I haven't been able to get ZSL to open the indexes. When trying to open() the index, ZSL fails with Fatal error: Uncaught exception 'Zend_Search_Lucene_Exception' with message 'File 'data/index/_0.cfs' is not readable.' Anyone got any insight to this matter? Or perhaps a separate crawler solution to suggest?
