> I would definitely not suggest using SSS for fields like legal brief text or > emails where there is huge > variability in the length of the content -- i can't think of any context > where a "short" email is > definitively better/worse then a "long" email. more traditional TF/IDF seems > like it would make more > sense there.
I was coming to a similar conclusion. > well ... hopefully the Similarity docs and the the docs on Lucene scoring > have filled in most of those > blanks before you drill down into the specifics of how SSS work. if not, > then any concrete > improvements you can suggest would certainly be apprecaited... > > https://builds.apache.org/view/G-L/view/Lucene/job/Lucene-trunk/javadoc/core/index.html > https://builds.apache.org/view/G-L/view/Lucene/job/Lucene-trunk/javadoc/core/org/apache/lucene/search/similarities/Similarity.html > > https://svn.apache.org/viewvc/lucene/dev/trunk/lucene/site/build/site/scoring.html?view=co Thanks for the links. The first thing I notice is that what is listed at the top of Similarity is totally changed. Great stuff about the object interaction. For example, I didn't understand how Weight object fit in until reading that. But I see I got what I asked for. Someone thought describing the object interaction was more important than the scoring formula itself. I chew on it (but I'm currently using the 3.4 code). My only thought is that the new stuff seems to be at the expense of the formulas listed in the old class overview for Similarity. http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apache/lucene/search/Similarity.html I would think that some of the old math, particularly the formula as it corresponds to the methods, would still be useful information even if I can't claim to know where it might be placed. Maybe something like the site scoring page could talk how the arithmetic maps to the methods and how phrase scoring messes with scoring. Just my $0.02 thanks -Paul --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org