Thanks Paolo, the issue was created. Please check. https://issues.apache.org/jira/browse/JENA-242
-----Original Message----- From: Paolo Castagna [mailto:castagna.li...@googlemail.com] Sent: Thursday, May 03, 2012 6:08 PM To: jena-users@incubator.apache.org Subject: Re: LARQ scores not normalized (Was: [ANN] Release of Apache Jena LARQ 1.0.0-incubating) Hi Tao, please, go ahead and open a JIRA issue for this. (I can do that if you prefer, but you found it and you should be the 'reporter' of the issue). Thanks, Paolo Tao (陶信东) wrote: > Thanks Paolo. I want normalized scores to filter sparql results (so > that only items above certain quality is shown). > > I know Lucene scores cannot ensure the quality of a search for the RDF > literals. So maybe we should re-score LARQ with something else, e.g. > minimal edit distance? > > Thanks > Tao > > -----Original Message----- > From: Paolo Castagna [mailto:castagna.li...@googlemail.com] > Sent: Thursday, May 03, 2012 4:38 PM > To: jena-users@incubator.apache.org > Subject: Re: LARQ scores not normalized (Was: [ANN] Release of Apache > Jena LARQ 1.0.0-incubating) > > By the way, Tao, why do you want/need normalized scores? > > "score values are meaningful only for purposes of comparison between > other documents for the exact same query and the exact same index. > when you try to compute a percentage, you are setting up an implicit > comparison with scores from other queries." > -- http://wiki.apache.org/lucene-java/ScoresAsPercentages > > So, perhaps, we should just keep it as it is and return to the users > scores as we get them from Lucene (i.e. not normalized). > > What do you think? > > I imagine people would use scores for sorting results and/or find the > highest match. Tao, are you using the scores for something else? > > Paolo > > Paolo Castagna wrote: >> Tao wrote: >>> Hi Paolo, >>> >>> Just noticed some change in the LARQ score. Originally the score >>> seemed to be normalized to range [0, 1]. Now the score can be higher >>> than 1. Is this a change of Lucene or LARQ? >>> >>> How can I get the old good [0, 1] LARQ score now? >>> >>> Thanks >>> Tao >> Hi Tao, >> first of all, thanks. >> >> I see... LARQ is now using Lucene 3.x and something might have >> changed there or something went wrong while porting LARQ over Lucene 3.x new APIs. >> >> Do you want to raise a JIRA issue for this? >> https://issues.apache.org/jira/browse/JENA >> >> The good news is that it should not be that difficult to fix and if >> you want you can try submitting a patch for this. >> >> All searches call the IndexLARQ.search(...) [1] method which does >> something like this (reformatted): >> >> TopDocs topDocs = ... >> Map1<ScoreDoc,HitLARQ> converter = new Map1<ScoreDoc,HitLARQ>(){ >> public HitLARQ map1(ScoreDoc object) { >> return new HitLARQ(searcher, object) ; >> }} ; >> Iterator<ScoreDoc> iterScoreDoc = >> Arrays.asList(topDocs.scoreDocs).iterator() ; >> Iterator<HitLARQ> iter = >> new Map1Iterator<ScoreDoc, HitLARQ>(converter, iterScoreDoc) ; >> return iter ; >> >> There is a getMaxScore method in Lucene's TopDocs [2] which we can >> use to normalize scores for the same query. >> >> Paolo >> >> [1] >> http://svn.apache.org/repos/asf/incubator/jena/Jena2/LARQ/trunk/src/m >> a in/java/org/apache/jena/larq/IndexLARQ.java >> [2] >> http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/c >> o >> re/org/apache/lucene/search/TopDocs.html#getMaxScore%28%29 >