Re: positional token info

Erik Hatcher Tue, 21 Oct 2003 16:18:33 -0700

On Tuesday, October 21, 2003, at 12:53 PM, Doug Cutting wrote:

If however you want "phone the boy" to match "phone X boy" where X is any word, then PhraseQuery would have to be extended. It's actually a pretty simple extension. Each term in a PhraseQuery corresponds to a PhrasePositions object. The 'offset' field within this is the position of the term in the phrase. If you construct the phrase positions for a two-term phrase so that the first has offset=0 and the second offset=2, then you'll get this sort of matching. So all that's needed is a new method PhraseQuery.add(Term term, int offset), and for these offsets to be stored so that they can be used when building PhrasePositions. Would this be a useful feature?

My questions were really from an academic understanding nature about position increments and how it related to searching. I definitely agree (and who could argue?) with Nutch and Google! Removing stop words is not a good thing, but smart handling of pervasive terms is important as you have implemented in Nutch when not doing phrase queries and how the bi-gram stuff works.

It does seem handy to avoid exact phrase matches on "phone boy" when a stop word is removed though, so patching StopFilter to put in the missing positions seems reasonable to me currently. Any objections to that?

Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: positional token info

Reply via email to