> hi all, I have been playing > with Lucene for a while now, but stuck on a perplexing > issue. > > I have an index, with a field "Affiliation", some example > values are: > > - "Stanford University School of Medicine, Palo Alto, CA > USA", > - "Institute of Neurobiology, School of Medicine, Stanford > University, Palo Alto, CA", > - "School of Medicine, Harvard University, Boston MA", > - "Brigham & Women's, Harvard University School of > Medicine, Boston, MA" > - "Harvard University, Cambridge MA" > > and so on... (the bottom-line being the affiliations are > written in multiple ways with no apparent consistency) > > I query the index on the affiliation field using say > "School of Medicine, Stanford University, Palo Alto, CA" > (with QueryParser) to find all Stanford related documents, > I get a lot of false +ves, presumably because of the > presence of School of Medicine etc. etc. (note: I cannot use > Phrase query because of variability in the way affiliation > is constructed) > > I have tried the following: > > 1. Use a SpanNearQuery by splitting the search phrase with > a whitespace (here I get no results!) > 2. Tried boosting (using ^) by splitting with the comma and > boosting the last parts such as "Palo Alto CA" with a much > higher boost than the initial phrases. Here I still get lots > of false +ves. > > Any suggestions on how to approach this? Is SpanNear the > way to go? Any other ideas on why I get 0 results? > > Thanks in advance for helping a newbie.
If I were you, I would start with default operator as pure AND. (100% clauses must match) QueryParser.setDefaultOperator(); If this query does not return any documents I would switch to OR as an default operator and get documents matching 80% of the optional clauses. If not I would lower the percentage of the optional clauses that should match. Lets say till 50%. This param can be set using : Query q = QueryParser.parse("School of Medicine, Stanford University, Palo Alto, CA"); if(q instanceof BooleanQuery) q(BooleanQuery).setMinimumNumberShouldMatch() --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org