Hi, > -----Mensaje original----- > De: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED]]En nombre de > [EMAIL PROTECTED] > Enviado el: miercoles, 26 de septiembre de 2001 18:38 > Para: [EMAIL PROTECTED] > Asunto: Re: [htdig] new version and phrase relevance ranking > > > At 11:23 PM -0400 9/25/01, Geoff Hutchison wrote: ... > >I'm assuming you're talking about some sort of proximity ranking. In other > >words, if you performed a regular query and the queried words fell close > >together on the page, it would score higher. > > > >Yes, this is certainly considered. The catch is coming up with a way to > >score this quickly. It seems like mathematically you want to compute > >something like the minimum distance between all words in the query. But > >this seems a bit costly. Certainly if you know of references on computing > >this proximity quickly, I'd be interested to read them. > > I mean that if the words are next to each other, they are related -- > doesn't have to be a more complex proximity matching than that. If > you store the word offsets in the index entries, you can simply check > to see if they're next to each other (n+1), if so, you've got a > really fine hit and should weight it very heavily. >
Both things can be done with the match location information we have now. A naif solution to Avi's approach could be performing a phrase search with all the words in the query, in order of appearance. But this is usable only if we want *all* words in the query to appear in the results, and does not take in account partial matches (partially unordered, partially missing). Some sohpistication of the current Phrase could be hacked out. BTW, both solutions (min distance and pseudo-phrase) should have linear cost on the number of matches if these matches to be examined are ordered by location, which is the case IIRC. Am I wrong, or is there another issue I can't see, Geoff? -- Quim _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

