Hi, I just wondered how other lucene users are handling having multiple document types available for searching.
I am initially concentrating on news groups, and so was planning on converting all news postings into XML. There would be one document field for each header field in the posting, one document field for the whole header, and one document field for the body text. I would also then like to add html to the indexes and later other document types. To do this I think i need to try and identify things which will be common across all(most) document types such as "author" and "topic". For the news posting i would then have to map the "From" and "Subject" fields over to "author" and "topic" whereas in the html i would map over the "built by" (or similar) string if it exists and perhaps the <TITLE>. My aim is to give users an advanced search capability over multiple document types. I'm not sure if I am looking at the problem the correct way, or, if I am, where I should do the mappings from document specific fields such as "From" to my generic ones such as "author". I could duplicate the data, so a news posting would have both "author" and "From" fields, or should build it into my search parsing so when the user enters the query "topic: jt" it gets converted to "Subject: jt | Title: jt" to get both the news and the html. Any comments would be appreciated, Thanks for reading this far! jt ___________________________________________________________ALL-NEW Yahoo! Messenger - sooooo many all-new ways to express yourself http://uk.messenger.yahoo.com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]