I would add:

1. Read the demo's - there is a demo app for creating an index with external 
docs in there, from memory.

2. Look on codeproject.com for ifilter wrappers, this is a great way to break 
up office docs, pdfs etc into just the words, which lucene can index. It's not 
always totally thread-safe, so you may want to put it into a windows service or 
otherwise serialize it, but it does work. IFilters come with windows (DOC, XLS 
etc), can be installed seperatly (PDF, ZIP etc), or come with 
sharepoint/windows sharepoint services (DOCX, XLSX et al).

I've done this as part of Archive Manager (www.quest.com). We indexed all the 
incoming attachments and messages using Lucene.Net - the largest customer I can 
recall, and this is 12 months ago and it was growing, had around 20 million 
emails and maybe 15 million attachments (we single-instance based on an MD5 
hash). Performance of the index was outstanding.

It's not as simple as pointing lucene at your folder of documents, but it's not 
hard, either. If you want the point-and-index, look at the MS index engine, 
which does that. Of course, it's nowhere near as flexible as lucene, and harder 
to integrate.....

The book is good tho.

-----Original Message-----
From: Dean Harding [mailto:[EMAIL PROTECTED] 
Sent: 17 January 2008 06:39
To: [email protected]
Subject: Re: Urgent Help Required

Chirag Patel wrote:
> Hello,
> 
> This is Chirag patel.
> 
> I want to use lucene.net as a search engine with our web application.
> 
> We have extensive search requirements like search on PDF, Doc, HTML etc.

I suggest you pick up a copy of the "Lucene in Action" book 
(http://www.manning.com/hatcher2/).

It explains everything you need to do whar you want.

Dean. 
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated. If you 
have received it in error, please delete it from your system. Do not use, copy 
or disclose the information in any way nor act in reliance on it and notify the 
sender immediately.
 
Please note that the BBC monitors e-mails sent or received. Further 
communication will signify your consent to this

This e-mail has been sent by one of the following wholly-owned subsidiaries of 
the BBC:
 
BBC Worldwide, Registration Number: 1420028 England, Registered Address: 
Woodlands, 80 Wood Lane, London W12 0TT
BBC World, Registration Number: 04514407 England, Registered Address: 
Woodlands, 80 Wood Lane, London W12 0TT
BBC World Distribution Limited, Registration Number: 04514408, Registered 
Address: Woodlands, 80 Wood Lane, London W12 0TT

Reply via email to