[ 
https://issues.apache.org/jira/browse/LUCENE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13641704#comment-13641704
 ] 

Robert Muir commented on LUCENE-4955:
-------------------------------------

+1 Adrien. these analysis components should either be fixed or removed.

We can speed up the process now by changing IndexWriter to reject this kinda 
bogus shit. We shouldnt be putting broken data into e.g. term vectors. That 
should encourage the fixing process.
                
> NGramTokenFilter increments positions for each gram
> ---------------------------------------------------
>
>                 Key: LUCENE-4955
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4955
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/analysis
>    Affects Versions: 4.3
>            Reporter: Simon Willnauer
>             Fix For: 5.0, 4.4
>
>         Attachments: highlighter-test.patch, LUCENE-4955.patch
>
>
> NGramTokenFilter increments positions for each gram rather for the actual 
> token which can lead to rather funny problems especially with highlighting. 
> if this filter should be used for highlighting is a different story but today 
> this seems to be a common practice in many situations to highlight sub-term 
> matches.
> I have a test for highlighting that uses ngram failing with a StringIOOB 
> since tokens are sorted by position which causes offsets to be mixed up due 
> to ngram token filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to