[ 
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16504540#comment-16504540
 ] 

Alan Woodward commented on LUCENE-8273:
---------------------------------------

Both of these failures are due to ShingleFilter not properly handling graphs.  
Without being wrapped in a condition, the ShingleFilter is mangling its input 
graph, but it's doing it in a consistent way, so the ValidatingTokenFilter is 
happy.  However, if it's randomly turned off, then occasionally the 
ValidatingTokenFilter gets the plain input graph as opposed to the mangled one, 
and so it complains because offsets are no longer consistent.

I'm not quite sure how best to fix this.  Ideally, we'd just fix ShingleFilter, 
but that's not as simple as it sounds.  Perhaps the simplest thing to do is to 
add ShingleFilter to the blacklist, and document that ConditionalTokenFilter 
won't work with broken graph inputs?



> Add a ConditionalTokenFilter
> ----------------------------
>
>                 Key: LUCENE-8273
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8273
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>            Priority: Major
>             Fix For: 7.4
>
>         Attachments: LUCENE-8273-2.patch, LUCENE-8273-2.patch, 
> LUCENE-8273-part2-rebased.patch, LUCENE-8273-part2-rebased.patch, 
> LUCENE-8273-part2.patch, LUCENE-8273-part2.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, 
> LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter 
> in such a way that it could optionally be bypassed based on the current state 
> of the TokenStream.  This could be used to, for example, only apply 
> WordDelimiterFilter to terms that contain hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to