[ 
https://issues.apache.org/jira/browse/LUCENE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629827#action_12629827
 ] 

Steven Rowe commented on LUCENE-1380:
-------------------------------------

As I said in the thread on java-user that spawned this issue: 
<http://www.nabble.com/Replacing-FAST-functionality-at-sesam.no---ShingleFilter%2B-exact-matching-td19396291.html>
 (emphasis added):

{quote}
It works because you've set all of the shingles to be at the same position - 
probably better to change the one instance of .setPositionIncrement(0) to 
.setPositionIncrement(1) - that way, MultiPhraseQuery will not be invoked, and 
the standard disjunction thing should happen.

> [W]ould a patch to ShingleFilter that offers an option
> "unigramPositionIncrement" (that defaults to 1) likely be
> accepted into trunk?

The issue is not directly related to whether a unigram is involved, but rather 
whether or not _*tokens that begin at the same word*_ are given the same 
position.  The option thus should be named something like 
"coterminalPositionIncrement".  This seems like a reasonable addition, and a 
patch likely would be accepted, if it included unit tests.
{quote}

You have used the option name I suggested, but have implemented it in a form 
that doesn't follow the name -- in your implementation, *all* tokens are placed 
at the same position, not just those that start at the same word -- and I think 
this form is inappropriate for the general user.

I'm -1 on the patch in its current form.  If rewritten to modify the position 
increment only for those shingles that begin at the same word, I'd be +1 
(assuming it works and is tested appropriately).

> Patch for ShingleFilter.coterminalPositionIncrement
> ---------------------------------------------------
>
>                 Key: LUCENE-1380
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1380
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>            Reporter: Michael Semb Wever
>             Fix For: 2.4
>
>         Attachments: LUCENE-1380.patch
>
>
> Make it possible for *all* words and shingles to be placed at the same 
> position.
> Default is to place each shingle at the same position as the unigram (or 
> first shingle if outputUnigrams=false). That is, each coterminal token has 
> positionIncrement=1 and every other token a positionIncrement=0. 
> This leads to a MultiPhraseQuery where at least one word/shingle must be 
> matched from each word/token. This is not always desired. 
> See http://comments.gmane.org/gmane.comp.jakarta.lucene.user/34746 for 
> mailing list thread.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to