I would add: 1. Read the demo's - there is a demo app for creating an index with external docs in there, from memory.
2. Look on codeproject.com for ifilter wrappers, this is a great way to break up office docs, pdfs etc into just the words, which lucene can index. It's not always totally thread-safe, so you may want to put it into a windows service or otherwise serialize it, but it does work. IFilters come with windows (DOC, XLS etc), can be installed seperatly (PDF, ZIP etc), or come with sharepoint/windows sharepoint services (DOCX, XLSX et al). I've done this as part of Archive Manager (www.quest.com). We indexed all the incoming attachments and messages using Lucene.Net - the largest customer I can recall, and this is 12 months ago and it was growing, had around 20 million emails and maybe 15 million attachments (we single-instance based on an MD5 hash). Performance of the index was outstanding. It's not as simple as pointing lucene at your folder of documents, but it's not hard, either. If you want the point-and-index, look at the MS index engine, which does that. Of course, it's nowhere near as flexible as lucene, and harder to integrate..... The book is good tho. -----Original Message----- From: Dean Harding [mailto:[EMAIL PROTECTED] Sent: 17 January 2008 06:39 To: [email protected] Subject: Re: Urgent Help Required Chirag Patel wrote: > Hello, > > This is Chirag patel. > > I want to use lucene.net as a search engine with our web application. > > We have extensive search requirements like search on PDF, Doc, HTML etc. I suggest you pick up a copy of the "Lucene in Action" book (http://www.manning.com/hatcher2/). It explains everything you need to do whar you want. Dean. This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this This e-mail has been sent by one of the following wholly-owned subsidiaries of the BBC: BBC Worldwide, Registration Number: 1420028 England, Registered Address: Woodlands, 80 Wood Lane, London W12 0TT BBC World, Registration Number: 04514407 England, Registered Address: Woodlands, 80 Wood Lane, London W12 0TT BBC World Distribution Limited, Registration Number: 04514408, Registered Address: Woodlands, 80 Wood Lane, London W12 0TT
