I saw a post that sort of touched on my question, I think, but it didn't seem quite the same...
What's the best way to index a document with multiple values for the same field? I'm trying to optimize search time and accuracy. We have a database of companies that we want to be able to search on, and the fields will include company name, address, and telephone number. Some companies have more than one name, though. For example, BMG is also known as Bertelsmann Music Group. Our users need to be able to search on either of these names and find a match. In our raw data, these different names are in separate fields for alternate names... But which is a better way to implement this in Lucene: A) Duplicate documents by using all the same data except for the name (i.e. 1 document for BMG at 123 fake street and 1 document for Bertelsmann Music Group at 123 fake street) B) Create 5 fields for alternate names (Which 80% of companies don't have at all so they'd be empty) and then when doing a search query, search for the same thing across all 6 fields? (i.e. name:BMG OR altname1:BGM OR altname2:BMG... etc) C) Put all of the altername names together into the name field (i.e. BMG Bertelsmann Music Group). Is there anything to delimit the different names with so that they would be treated as separate entities? --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
