Ramesh,

I haven't examined the code closely that does this positioning, but this is how I 
believe it works:

Let say you had a token stream that returned the tokens "you", "are", "running", 
"faster", "than", "me" that didn't do any setPositionIncrement calls. The default 
increment is 1.

Each token in the stream gets a position that allows you do things like proximity 
searches the query "are than"~3 would find the document that token stream came from 
since "are" occurrs at position 2 and "than" at position 5 and 5-2 <= 3.

Now lets say you wanted to stem "running" to "run" but keep the original token. You 
would create a token filter that inserted the stem "run" into the token stream when 
the "running" token occurred but also kept the original token "running". If you didn't 
set the position increment on the second token then the distance between "are" and 
"than" would become 6-2 = 4 which is greater than 3 and your proximity query would 
fail.

When you set the position increment to zero for the added token it gets treated like 
it is at the same position as the original token which prevents you from breaking your 
proximity query.

Proximity queries are the place I know this affects. I'm unsure how the positions 
affect other parts Lucene.

Hope I got all that right and that it helps you understand the setPositionIncrement.

Eric

-----Original Message-----
From: Mailing Lists Account [mailto:[EMAIL PROTECTED]]
Sent: Thursday, February 13, 2003 7:07 AM
To: Lucene Users List
Subject: Re: Phrase query and porter stemmer


Hi Eric,

Thanks for the reply.  The option of custom token filter sounds good to
me. I am not sure what is the
advantage of Token.setPositionIncrement() option. Let me look into the
docs before I ask further
questions on this.

regards
Ramesh

Eric Isakson wrote:
> You won't get hits for "security" if you do not use the stemmer. The
> stem of "security" is the token that gets stored in the index.
>
> If you don't use the stemming algorithm when you create the index you
> could search for "security" and only get those documents that contain
> "security".
>
> See the FAQ
>
http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.indexi
> ng&toc=faq#q15
>
> If you have a list of terms you want to treat differently (i.e. you
> know there are certain words you don't want to stem) you could build
> a custom TokenFilter that checks the tokens for those words before
> applying the stemming algorithm then add that TokenFilter to your
> analyzer. You might also consider allowing the tokens to be stemmed
> and adding the original non-stemmed term at the same position using
> Token.setPositionIncrement(0), you might also want to figure out some
> way to boost the score on those non-stemmed tokens when you build
> your query (not sure how you might accomplish that, but some custom
> query parsing code could do the trick).
>
> Eric
>
> -----Original Message-----
> From: Mailing Lists Account [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, February 12, 2003 4:17 AM
> To: [EMAIL PROTECTED]
> Subject: Phrase query and porter stemmer
>
>
> Hi,
>
> I use PorterStemmer with my analyzer for indexing the documents.
> And I have been using the same analyzer for searching too.
>
> When I search for a phrase like "security" AND database, I would like
> to avoid matches for
> terms like "secure" or "securities" .  I observed that Google and
> couple of search engines do
> not return such matches.
>
> 1) In otherwords, in a single query, is it possible not to choose
> porter stemmer for phrase queries and
>     use for other queries (such as Term query etc)
>
> 2) As an alternative, is it advisable to manually construct a
> PhraseQuery by adding terms without appling porter
>    stemmer ?
>
> regards
> Ramesh
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to