Hi Aaron,

Your "false positives" comments point to a mismatch between what you're 
currently asking Lucene for (any document matching any one of the terms in the 
query) and what you want (only fully "correct" matches).

You need to identify the terms of the query that MUST match and tell Lucene 
about it ("+" syntax is understood by QueryParser to mean a required term).

If your queries come from sources that don't reliably match the indexes values, 
you may need to use synonyms to map between e.g. "California" and "CA", and 
then require that at least one of the synonyms matches (e.g. "+(California 
CA)").

Steve

On 03/23/2010 at 5:08 PM, Aaron Schon wrote:
> hi all, I have been playing with Lucene for a while now, but stuck on a
> perplexing issue.
> 
> I have an index, with a field "Affiliation", some example values are:
> 
> - "Stanford University School of Medicine, Palo Alto, CA USA", -
> "Institute of Neurobiology, School of Medicine, Stanford University,
> Palo Alto, CA", - "School of Medicine, Harvard University, Boston MA", -
> "Brigham & Women's, Harvard University School of Medicine, Boston, MA" -
> "Harvard University, Cambridge MA"
> 
> and so on... (the bottom-line being the affiliations are written in
> multiple ways with no apparent consistency)
> 
> I query the index on  the affiliation field using say "School of
> Medicine, Stanford University, Palo Alto, CA" (with QueryParser) to
> find all Stanford related documents, I get a lot of false +ves,
> presumably because of the presence of School of Medicine etc. etc.
> (note: I cannot use Phrase query because of variability in the way
> affiliation is constructed)
> 
> I have tried the following:
> 
> 1. Use a SpanNearQuery by splitting the search phrase with a whitespace
> (here I get no results!)
> 2. Tried boosting (using ^) by splitting with the comma and boosting
> the last parts such as "Palo Alto CA" with a much higher boost than the
> initial phrases. Here I still get lots of false +ves.
> 
> Any suggestions on how to approach this? Is SpanNear the way to go? Any
> other ideas on why I get 0 results?
> 
> Thanks in advance for helping a newbie.
> 
> AS


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to