On 05/03/2012 19:26, Chris Hostetter wrote:
: very small to occasionally very large. It also might be the case that
: cover letters and e-mails while short might not be really something to
: heavily discount. The lower discount range can be ignored by setting
: the min of any sweet spot to 1. Then one starts to wonder if there is
: really is any level area.
I would definitley not suggest using SSS for fields like legal brief text
or emails where there is huge variability in the length of the content --
i can't think of any context where a "short" email is definitively
better/worse then a "long" email. more traditional TF/IDF seems like it
would make more sense there.
: When I get that deep in the code the issue is not simply the shape of
: the equation, but issues like how tweaking any parameters effects the
: overall document scores. For example, consider the comments about
: "steepness" related to length norm. It talks (some) mathematics of the
: equation, but until one spends some time with that equation and
: understanding where they all fit together, I doubt it jumps out at most
: folks what large or smaller values mean for terms and resulting document
: scores.
:
: One obvious hard to tease out part of the Similarity API is when each
: part is called -- the simplest being index time vs. search time -- there
well ... hopefully the Similarity docs and the the docs on Lucene scoring
have filled in most of those blanks before you drill down into the
specifics of how SSS work. if not, then any concrete improvements you can
suggest would certainly be apprecaited...
https://builds.apache.org/view/G-L/view/Lucene/job/Lucene-trunk/javadoc/core/index.html
https://builds.apache.org/view/G-L/view/Lucene/job/Lucene-trunk/javadoc/core/org/apache/lucene/search/similarities/Similarity.html
https://svn.apache.org/viewvc/lucene/dev/trunk/lucene/site/build/site/scoring.html?view=co
Chapter 12 Document Ranking in Hibernate Search in Action gives a
thorough explanation of Lucene Scoring and the Similarity class which
Ive found helpful. I think its worth mentioning as not the most obvious
book for this subject.
Paul
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org