[ 
https://issues.apache.org/jira/browse/LUCENE-1908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754610#action_12754610
 ] 

Doron Cohen commented on LUCENE-1908:
-------------------------------------

{quote}
I'm still a little confused I guess 
{quote}

That makes too of us... :)

{quote}
The longer docs will have larger weights naturally is what I meant, but larger 
weights actually hurts in the cosine normalization - so it actually over 
punishes I guess? I dunno - all of this over punish/ under punish is in 
comparison to a relevancy curve they figure out ( a probability of relevance as 
a function of document length), and how the pivoted cosine curves compare 
against it. I'm just reading across random interweb pdfs from google. 
Apparently our pivot also over punishes large docs and over favors small, the 
same as this one, but perhaps not as bad ? I'm seeing that in a Lucene/Juru 
research pdf. This stuff is hard to grok on first pass.
{quote}

In that work we got best results from Lucene (for TREC) with SweetSpot 
similarity and normalizing tf by average term-freq in doc.

For me it was mainly experimental and intuitive, but I was not able to convince 
Hoss (or even convince myslf once I read Hoss comments) that this was justified 
theoretically. 

At that time I was not aware of the V(d) normalization delicacy we are 
discussing now. I think I understand things better now, and still I am not 
sure. Need to read 
http://nlp.stanford.edu/IR-book/html/htmledition/pivoted-normalized-document-length-1.html
 and sleep on it... 

> Similarity javadocs for scoring function to relate more tightly to scoring 
> models in effect
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1908
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1908
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1908.patch, LUCENE-1908.patch
>
>
> See discussion in the related issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to