On 05/03/2012 23:24, Robert Muir wrote:
On Mon, Mar 5, 2012 at 6:01 PM, Paul Hill<p...@metajure.com> wrote:
I would definitely not suggest using SSS for fields like legal brief text or
emails where there is huge
variability in the length of the content -- i can't think of any context where a
"short" email is
definitively better/worse then a "long" email. more traditional TF/IDF seems
like it would make more
sense there.
I was coming to a similar conclusion.
well ... hopefully the Similarity docs and the the docs on Lucene scoring have
filled in most of those
blanks before you drill down into the specifics of how SSS work. if not, then
any concrete
improvements you can suggest would certainly be apprecaited...
https://builds.apache.org/view/G-L/view/Lucene/job/Lucene-trunk/javadoc/core/index.html
https://builds.apache.org/view/G-L/view/Lucene/job/Lucene-trunk/javadoc/core/org/apache/lucene/search/similarities/Similarity.html
https://svn.apache.org/viewvc/lucene/dev/trunk/lucene/site/build/site/scoring.html?view=co
Thanks for the links.
The first thing I notice is that what is listed at the top of Similarity is
totally changed. Great stuff about the object interaction. For example, I
didn't understand how Weight object fit in until reading that.
But I see I got what I asked for. Someone thought describing the object
interaction was more important than the scoring formula itself. I chew on it
(but I'm currently using the 3.4 code).
My only thought is that the new stuff seems to be at the expense of the
formulas listed in the old class overview for Similarity.
Hello,
what is previously Similarity in older releases is moved to
TFIDFSimilarity: it extends Similarity and exposes a vector-space API,
with its same formulas in the javadocs:
https://builds.apache.org/view/G-L/view/Lucene/job/Lucene-trunk/javadoc/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
Looks good, do you know if this stuff will make it into 3.6 ?
Paul
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org