On 05/03/2012 23:24, Robert Muir wrote:
On Mon, Mar 5, 2012 at 6:01 PM, Paul Hill<p...@metajure.com>  wrote:
I would definitely not suggest using SSS for fields like legal brief text or 
emails where there is huge
variability in the length of the content -- i can't think of any context where a 
"short" email is
definitively better/worse then a "long" email.  more traditional TF/IDF seems 
like it would make more
sense there.
I was coming to a similar conclusion.

well ... hopefully the Similarity docs and the the docs on Lucene scoring have 
filled in most of those
blanks before you drill down into the specifics of how SSS work.  if not, then 
any concrete
improvements you can suggest would certainly be apprecaited...

https://builds.apache.org/view/G-L/view/Lucene/job/Lucene-trunk/javadoc/core/index.html
https://builds.apache.org/view/G-L/view/Lucene/job/Lucene-trunk/javadoc/core/org/apache/lucene/search/similarities/Similarity.html

https://svn.apache.org/viewvc/lucene/dev/trunk/lucene/site/build/site/scoring.html?view=co
Thanks for the links.
The first thing I notice is that what is listed at the top of Similarity is 
totally changed.  Great stuff about the object interaction. For example, I 
didn't understand how Weight object fit in until reading that.
But I see I got what I asked for.  Someone thought describing the object 
interaction was more important than the scoring formula itself.  I chew on it 
(but I'm currently using the 3.4 code).

My only thought is that the new stuff seems to be at the expense of the 
formulas listed in the old class overview for Similarity.
Hello,

what is previously Similarity in older releases is moved to
TFIDFSimilarity: it extends Similarity and exposes a vector-space API,
with its same formulas in the javadocs:
https://builds.apache.org/view/G-L/view/Lucene/job/Lucene-trunk/javadoc/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html

Looks good, do you know if this stuff will make it into 3.6 ?

Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to