hi all, I have been playing with Lucene for a while now, but stuck on a perplexing issue.
I have an index, with a field "Affiliation", some example values are: - "Stanford University School of Medicine, Palo Alto, CA USA", - "Institute of Neurobiology, School of Medicine, Stanford University, Palo Alto, CA", - "School of Medicine, Harvard University, Boston MA", - "Brigham & Women's, Harvard University School of Medicine, Boston, MA" - "Harvard University, Cambridge MA" and so on... (the bottom-line being the affiliations are written in multiple ways with no apparent consistency) I query the index on the affiliation field using say "School of Medicine, Stanford University, Palo Alto, CA" (with QueryParser) to find all Stanford related documents, I get a lot of false +ves, presumably because of the presence of School of Medicine etc. etc. (note: I cannot use Phrase query because of variability in the way affiliation is constructed) I have tried the following: 1. Use a SpanNearQuery by splitting the search phrase with a whitespace (here I get no results!) 2. Tried boosting (using ^) by splitting with the comma and boosting the last parts such as "Palo Alto CA" with a much higher boost than the initial phrases. Here I still get lots of false +ves. Any suggestions on how to approach this? Is SpanNear the way to go? Any other ideas on why I get 0 results? Thanks in advance for helping a newbie. AS --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org