That's exactly the question. If all 16 documents have exactly the same score, then the internal tie-breaking is your answer. They would also all have strictly increasing doc IDs.
But I'd check to see the scores before accepting this explanation because I find it unlikely that all 16 docs have identical scores. But it's worth checking out before looking for more complex answers..... Best Erick On Mon, Nov 2, 2009 at 8:11 AM, kenji tsuruoka <kenji.tsuru...@gmail.com>wrote: > Thank you Erick. > > What you mentioned is right. > The two same documents were shown at the 3rd and 18th. > > So do you mean documents between the 3rd and the 18th (at least) in the > Lucene results have the same score? > > Cheers, > K > > > On Nov 2, 2009, at 9:59 PM, Erick Erickson wrote: > > What were their scores? I'm assuming that by "rank" you mean >> the order in which the documents were returned, not the raw Lucene >> score. >> >> Lucene uses the insertion order to break ties. That is, two documents >> with the same score will the appear in the order of their (internal) >> Lucene doc ID. >> >> So is it possible that *all* of the documents that appear between these >> two have the exact same score for that query? That seems a bit >> unlikely, but it's worth checking before going much further..... >> >> Best >> Erick >> >> On Mon, Nov 2, 2009 at 7:45 AM, kenji tsuruoka <kenji.tsuru...@gmail.com >> >wrote: >> >> Dear. Lucene users. >>> >>> Hi. >>> I have tried to index and search MEDLINE abstracts by LUCENE. >>> >>> And there were some problems in the search results. >>> That is Lucene has assigned different ranks for the exactly same >>> documents. >>> >>> I didn't know the input documents for the index contain duplicate >>> documents >>> at the first time. >>> I have solve the problem by making all input documents UNIQUE for the >>> index. >>> >>> But I want to know how and why the situation was happened. >>> >>> The duplicate document is as follows: >>> >>> _pubmed_id=13029105:1952Nov15 >>> _ArticleTitle_ >>> <s n="1">Experimental diabetes and clinical diabetes.</s> >>> _pubmed_id_end_ >>> >>> There are TWO exactly same documents in "index". >>> And their rankings by Lucene are 3 and 18. >>> >>> I have known texts in XML/HTML data should be extracted before indexing. >>> Anyway, I haven't done this work now. >>> >>> Please let me know the reason why the same documents were shown different >>> ranks. >>> >>> Best, >>> K >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> >>> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >