[ https://issues.apache.org/jira/browse/LUCENE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629827#action_12629827 ]
Steven Rowe commented on LUCENE-1380: ------------------------------------- As I said in the thread on java-user that spawned this issue: <http://www.nabble.com/Replacing-FAST-functionality-at-sesam.no---ShingleFilter%2B-exact-matching-td19396291.html> (emphasis added): {quote} It works because you've set all of the shingles to be at the same position - probably better to change the one instance of .setPositionIncrement(0) to .setPositionIncrement(1) - that way, MultiPhraseQuery will not be invoked, and the standard disjunction thing should happen. > [W]ould a patch to ShingleFilter that offers an option > "unigramPositionIncrement" (that defaults to 1) likely be > accepted into trunk? The issue is not directly related to whether a unigram is involved, but rather whether or not _*tokens that begin at the same word*_ are given the same position. The option thus should be named something like "coterminalPositionIncrement". This seems like a reasonable addition, and a patch likely would be accepted, if it included unit tests. {quote} You have used the option name I suggested, but have implemented it in a form that doesn't follow the name -- in your implementation, *all* tokens are placed at the same position, not just those that start at the same word -- and I think this form is inappropriate for the general user. I'm -1 on the patch in its current form. If rewritten to modify the position increment only for those shingles that begin at the same word, I'd be +1 (assuming it works and is tested appropriately). > Patch for ShingleFilter.coterminalPositionIncrement > --------------------------------------------------- > > Key: LUCENE-1380 > URL: https://issues.apache.org/jira/browse/LUCENE-1380 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers > Reporter: Michael Semb Wever > Fix For: 2.4 > > Attachments: LUCENE-1380.patch > > > Make it possible for *all* words and shingles to be placed at the same > position. > Default is to place each shingle at the same position as the unigram (or > first shingle if outputUnigrams=false). That is, each coterminal token has > positionIncrement=1 and every other token a positionIncrement=0. > This leads to a MultiPhraseQuery where at least one word/shingle must be > matched from each word/token. This is not always desired. > See http://comments.gmane.org/gmane.comp.jakarta.lucene.user/34746 for > mailing list thread. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]