Maybe I'm not understanding your requirement, but this should be fairly simple in Lucene.
Each document in your document management system would be represented by a single Lucene document in the index. Each lucene document will then have several fields, each field representing the values of the "meta data" associated with your document in the document management system. For example: Lets say you have a document which has the following structure: Title: Sample Document Date: 01/01/2006 Attachment: Some attachment (this is the content of the attachment) Attachment: Some other attachment Refers: Mr X Refers: Mr Y Your Lucene document would have the same structure. All of these items in your "real" document would simply be Fields in the lucene document. In the case of your attachments, you could also consider indexing them separately (as well). This way users could search for attachments without needing the documents to which they are attached. If your attachments ARE documents (that is, your just have a foreign key style relationship between two documents), then you would simple index each "real" document as a separate document in lucene and add some sort of reference field which contains the ID of all related documents. For Example: ------------------- ID: 123 Title: Sample Document Date: 01/01/2006 Attachment: 456 Attachment: 789 Refers: Mr X Refers: Mr Y -------------------- ID: 456 Title: Other Document Date:.. etc etc... The one thing to be mindful of is re-indexing existing documents. If you have document that is already indexed and you want to make a change (eg you want to add a new "refers" value), then you need to re-index the entire document. This means you need to either "store" all fields you want to keep during re-indexing (which is typically all of them), or you need to re-index the document from its source. Storing all the data in the index can have adverse effects on the performance of the index however. (hope this makes sense). On 8/12/06, Shaghayegh Sahebie <[EMAIL PROTECTED]> wrote:
Hi all; We have got a Document management system and we want to build a search on it. We have tree kind of content in our system: Refers, Documents and Attachments. A document can have multiple attachments and can be Refered to many users. Our users want to be able to search on documents attachments and refers. for example they want to search the Documents which are created at "2006/07/06" date and have the word "Lucene" in it or their Refers and are Refered to Mr.x. Our users want to be ale to search in all 8 possible selections of Document, Refer and Attachment, I mean they want to be able to search just in Refers, in both Refers and Documents, ... How can we handle it? I thaught to store diferent kinds of Docs in a DB, search in the DB at first and search in Lucene based on DB results and phrases given to search (Handling Document, Refer or Attachments parts in a DB search). But the DB results maybe so big and i don't know if a Lucene query can have these much of search Terms. Another way is to Index each document, refer and attachment in the index 8 times(all the possible selections of Refer, Document and Attachment) but this way has lots of redundancy even more than 8 times! 'cause each Document is indexed "8 * Refer number of Document" times. I really don't know what to do, Any suggestions Please? Thanks in advance --------------------------------- Do you Yahoo!? Everyone is raving about the all-new Yahoo! Mail Beta.