: So in a business scenario where we have to make a decision based on the
: "accepted" matching of a document (say perform activity A only when a
: document matches more than 50%), we wont be able to rely on the match score
: because the score will change based on our query and some times 80% matching
: may not be as close as 5% matching with a slightly different query. (I know
: I am going back to % again :)
:
: So how do we handle such a scenario?
you have to redefine your criteria. "50% match" is meaninless -- you have
to decide what that means: does it mean matching half of the clauses in a
boolean query? what if a doc matches only 1/3 of the clauses, but it
matches them 100 times each? what if it matches 1/2 the clauses, 100 times
each, but that only makes up a tiny fraction of the total terms in thta
document (ie: it's got the entire contents of wikipedia in every field)?
what if the query isn't a boolean query but a phrase query?
if you have a constrained set of possible queries, and you can define
precisesly what rules you care about, you can modify your similarity class
such that regardless of the index to produces scores that you *can* use to
make inferences about given your rules.
See Also...
http://www.gossamer-threads.com/lists/lucene/java-user/61075
http://markmail.org/thread/3svvskbay4hpqyms
http://markmail.org/message/lztdm4xosmceup5t
And a real oldy but goodie...
http://markmail.org/message/5eipstcu6lky2h2j
-Hoss
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]