Hi, I am trying to move from a system where I counted the frequency of terms by hand in a highlighter to determine if a result was useful to me. In an earlier post on this list someone suggested I could boost the terms that are useful to me and only accept hits above a certain threshold. However, in my tests, I can't seem to find a deterministic way of calculating a threshold.
Here is an example of what I mean: My query: "John Smith" "John Smith Manufacturing" "San Francisco" "California" Results are only useful to me if they contain the first term "John Smith" and/or the second term "John Smith Manufacturing" or any combination with the other San Fran and California terms. However, results with just "San Francisco" or "California" can be ignored. I tried something like "John Smith"^200 "John Smith Manufacturing"^100 "San Francisco"^2 "California"^1 But I can't seem to find a good method of calculating a cut-off score and filtering out the results that are only San Fran or California using the term boosting and resulting score. I also don't care about frequency, meaning that I want the result even if John Smith occurs once, and I don't want a document with "San Francisco" a million times to score higher than the single result for John Smith. Sorry if that's confusing. Any ideas? Thanks, Max