Hi all, I am new to lucene and nutch. I am doing a project on an archiving web portal which allow individual user to index document (from file system) and to crawl website and RSS feed for indexing.
Looking at the above requirement. I thought lucene is able to achieve it, however I found out that lucene does not have a crawler to crawl url. Then I look came across Nutch = perfect for my latter requirement to fetch website and RSS feed. I realise another thing from nutch it allow me to crawl my file system as well... Well then in my case, I guess I should be using API from nutch instead of Lucene? >From another discussion on Nabble: http://www.nabble.com/Integration-of-Nutch-td12016441.html#a12040333 there is this advice to use lucene to index the same index file that nutch have created. But I thought that nutch is using a webdb to store the return crawl result? anyway from the threat mention above... why would one use lucene if nutch can perform all the local file system and web index and search function please correct my brief understanding . Steven (Singapore) -- View this message in context: http://www.nabble.com/Lucene-or--Nutch----tp15532491p15532491.html Sent from the Lucene - General mailing list archive at Nabble.com.
