[jira] Issue Comment Edited: (LUCENE-1380) Patch for ShingleFilter.enablePositions

Michael Semb Wever (JIRA) Sun, 14 Sep 2008 06:00:17 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630885#action_12630885
 ]


michaelsembwever edited comment on LUCENE-1380 at 9/14/08 5:58 AM:
---------------------------------------------------------------------

> All this patch does is to set all position increment of the tokens produced 
> by the ShingleFilter to 0, right? 
> I'm going to remove this for 2.4 fix and recommend you to use the filter 
> strategy mentioned. 

The patch to add the new TokenFilter isn't easy-as-abc as lucene needs to have 
the filter class added to classpath, and Solr needs the TokenFilterFactory 
added to be able to read it from the configuration files. A lot of work when 
we're (almost) agreed that removing positional information from all tokens 
makes sense when using the ShingleFilter.

If it were just the one installation i wouldn't have a problem with adding the 
custom TokenFilter, but because our use-case is an open sourced and documented 
system ( read http://sesat.no/howto-solr-query-evaluation.html ) i'd like to 
make it as easy as possible for third parties.

I would also think that because this is a way to replace commercial and 
competing technology from FAST that the community would be behind such an 
enhancement...

      was (Author: michaelsembwever):
    > All this patch does is to set all position increment of the tokens 
produced by the ShingleFilter to 0, right? 
> I'm going to remove this for 2.4 fix and recommend you to use the filter 
> strategy mentioned. 

The patch to add the new TokenFilter isn't easy-as-abc as lucene needs to have 
the filter class added to classpath, and Solr needs the TokenFilterFactory 
added to be able to read it from the configuration files. A lot of work when 
we're (almost) agreed that removing positional information from all tokens 
makes sense when using the ShingleFilter.

If it were just the one installation i wouldn't have a problem with adding the 
custom TokenFilter, but because our use-case is an open sourced and documented 
system ( read http://sesat.no/howto-solr-query-evaluation.html ) i'd like to 
make it as easy as possible for third parties.

I would also think that this is a way to replace commercial and competing 
technology from FAST that the community would be behind such an enhancement...
  
> Patch for ShingleFilter.enablePositions
> ---------------------------------------
>
>                 Key: LUCENE-1380
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1380
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>            Reporter: Michael Semb Wever
>            Assignee: Karl Wettin
>            Priority: Trivial
>         Attachments: LUCENE-1380.patch, LUCENE-1380.patch
>
>
> Make it possible for *all* words and shingles to be placed at the same 
> position.
> Default is to place each shingle at the same position as the unigram (or 
> first shingle if outputUnigrams=false). That is, each coterminal token has 
> positionIncrement=1 and every other token a positionIncrement=0. 
> This leads to a MultiPhraseQuery where at least one word/shingle must be 
> matched from each word/token. This is not always desired. 
> See http://comments.gmane.org/gmane.comp.jakarta.lucene.user/34746 for 
> mailing list thread.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Issue Comment Edited: (LUCENE-1380) Patch for ShingleFilter.enablePositions

Reply via email to