: to index remote HTML files. Can I use Nutch to crawl for the remote HTML : files and use the index for the Lucene code I have already written? Or do : I have to redo the whole thing using the Nutch API? I am using boosting : during the indexing. I hope Nutch can boost fields, too. Any help would : be appreciated.
thebest place to start with a question like this is the Nutch documentation and user community -- between hose two information sources, you should be able to determine what constraints nutch puts on the fields of the index it creates, and what flexability you have to affect field/document boosts at index time. With that information in hand, you can make an informed choice about using nutch in conjunction with your direct lucene access code, re-writing your code to use whatever api nutch has, or using a third party crawler to fetch documents for your lucene based code and ignoring nutch. -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]