> I would definitely not suggest using SSS for fields like legal brief text or 
> emails where there is huge
> variability in the length of the content -- i can't think of any context 
> where a "short" email is
> definitively better/worse then a "long" email.  more traditional TF/IDF seems 
> like it would make more
> sense there.

I was coming to a similar conclusion.

> well ... hopefully the Similarity docs and the the docs on Lucene scoring 
> have filled in most of those
> blanks before you drill down into the specifics of how SSS work.  if not, 
> then any concrete
> improvements you can suggest would certainly be apprecaited...
> 
> https://builds.apache.org/view/G-L/view/Lucene/job/Lucene-trunk/javadoc/core/index.html
> https://builds.apache.org/view/G-L/view/Lucene/job/Lucene-trunk/javadoc/core/org/apache/lucene/search/similarities/Similarity.html
> 
> https://svn.apache.org/viewvc/lucene/dev/trunk/lucene/site/build/site/scoring.html?view=co

Thanks for the links.  
The first thing I notice is that what is listed at the top of Similarity is 
totally changed.  Great stuff about the object interaction. For example, I 
didn't understand how Weight object fit in until reading that.
But I see I got what I asked for.  Someone thought describing the object 
interaction was more important than the scoring formula itself.  I chew on it 
(but I'm currently using the 3.4 code).

My only thought is that the new stuff seems to be at the expense of the 
formulas listed in the old class overview for Similarity.
http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apache/lucene/search/Similarity.html
I would think that some of the old math, particularly the formula as it 
corresponds to the methods, would still be useful information even if I can't 
claim to know where it might be placed.

Maybe something like the site scoring page could talk how the arithmetic maps 
to the methods and how phrase scoring messes with scoring.
Just my $0.02

thanks

-Paul


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to