We're planning to archive email over many years and have been looking at using DB to store mail meta data and Lucene for the indexed mail data, or just Lucene on its own with email data and structure stored as XML and the raw message stored in the file system.

For some customers, the volumes are likely to be well over 1 billion mails over
10 years, so some  partitioning of data is needed.  At the moment the thoughts
are moving away from using a DB + Lucene to just Lucene along with a file system
representation of the complete message. All searches will be against the index then the XML mail meta data is loaded from the file system.

The archive is read only apart from bulk deletes, but one of the requirements is for users to be able to label their own mail. Given that a Lucene Document cannot be updated, I have thought about having a separate Lucene index that has just the 3 terms (or some combination of) userId + mailId + label.

That of course would mean joining searches from the main mail data index and the label index.

Does anyone have any experience of using Lucene this way and is it a realistic option of avoiding the DB at all? I'd rather the headache of scaling just Lucene, which is a simple beast, than the whole bundle of 'stuff' that comes with the database as well.

Antony




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to