Thanks Paolo, the issue was created. Please check.
https://issues.apache.org/jira/browse/JENA-242

-----Original Message-----
From: Paolo Castagna [mailto:castagna.li...@googlemail.com] 
Sent: Thursday, May 03, 2012 6:08 PM
To: jena-users@incubator.apache.org
Subject: Re: LARQ scores not normalized (Was: [ANN] Release of Apache Jena
LARQ 1.0.0-incubating)

Hi Tao,
please, go ahead and open a JIRA issue for this.
(I can do that if you prefer, but you found it and you should be the
'reporter'
of the issue).

Thanks,
Paolo

Tao (陶信东) wrote:
> Thanks Paolo. I want normalized scores to filter sparql results (so 
> that only items above certain quality is shown).
> 
> I know Lucene scores cannot ensure the quality of a search for the RDF 
> literals. So maybe we should re-score LARQ with something else, e.g. 
> minimal edit distance?
> 
> Thanks
> Tao
> 
> -----Original Message-----
> From: Paolo Castagna [mailto:castagna.li...@googlemail.com]
> Sent: Thursday, May 03, 2012 4:38 PM
> To: jena-users@incubator.apache.org
> Subject: Re: LARQ scores not normalized (Was: [ANN] Release of Apache 
> Jena LARQ 1.0.0-incubating)
> 
> By the way, Tao, why do you want/need normalized scores?
> 
>  "score values are meaningful only for purposes of comparison between
>   other documents for the exact same query and the exact same index.
>   when you try to compute a percentage, you are setting up an implicit
>   comparison with scores from other queries."
>   -- http://wiki.apache.org/lucene-java/ScoresAsPercentages
> 
> So, perhaps, we should just keep it as it is and return to the users 
> scores as we get them from Lucene (i.e. not normalized).
> 
> What do you think?
> 
> I imagine people would use scores for sorting results and/or find the 
> highest match. Tao, are you using the scores for something else?
> 
> Paolo
> 
> Paolo Castagna wrote:
>> Tao wrote:
>>> Hi Paolo,
>>>
>>> Just noticed some change in the LARQ score. Originally the score 
>>> seemed to be normalized to range [0, 1]. Now the score can be higher 
>>> than 1. Is this a change of Lucene or LARQ?
>>>
>>> How can I get the old good [0, 1] LARQ score now?
>>>
>>> Thanks
>>> Tao
>> Hi Tao,
>> first of all, thanks.
>>
>> I see... LARQ is now using Lucene 3.x and something might have 
>> changed there or something went wrong while porting LARQ over Lucene 3.x
new APIs.
>>
>> Do you want to raise a JIRA issue for this?
>> https://issues.apache.org/jira/browse/JENA
>>
>> The good news is that it should not be that difficult to fix and if 
>> you want you can try submitting a patch for this.
>>
>> All searches call the IndexLARQ.search(...) [1] method which does 
>> something like this (reformatted):
>>
>>   TopDocs topDocs = ...
>>   Map1<ScoreDoc,HitLARQ> converter = new Map1<ScoreDoc,HitLARQ>(){
>>     public HitLARQ map1(ScoreDoc object) {
>>       return new HitLARQ(searcher, object) ;
>>     }} ;
>>   Iterator<ScoreDoc> iterScoreDoc =
>>     Arrays.asList(topDocs.scoreDocs).iterator() ;
>>   Iterator<HitLARQ> iter =
>>     new Map1Iterator<ScoreDoc, HitLARQ>(converter, iterScoreDoc) ;
>>   return iter ;
>>
>> There is a getMaxScore method in Lucene's TopDocs [2] which we can 
>> use to normalize scores for the same query.
>>
>> Paolo
>>
>>  [1]
>> http://svn.apache.org/repos/asf/incubator/jena/Jena2/LARQ/trunk/src/m
>> a in/java/org/apache/jena/larq/IndexLARQ.java
>>  [2]
>> http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/c
>> o
>> re/org/apache/lucene/search/TopDocs.html#getMaxScore%28%29
> 

Reply via email to