[ 
https://issues.apache.org/jira/browse/LUCENE-8092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16293791#comment-16293791
 ] 

Uwe Schindler commented on LUCENE-8092:
---------------------------------------

This is indeed a funny edge-case. Maybe the second filter could alternatively 
reorder the tokens.

But I am not really sure if this is really a problem in real-world scenarios. 
It may affect all tokenfilters (not only those 2) that inject tokens with a 
diffent offset and length. I think the above combination is useless in 
real-world, but maybe others make sense?

An alternative would be to reorder tokens as last step in the analysis chain 
(using a "fixing" token filter). It just reorders all tokens with same position 
so their offsets increase?

> TestRandomChains failure
> ------------------------
>
>                 Key: LUCENE-8092
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8092
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Alan Woodward
>
> https://builds.apache.org/job/Lucene-Solr-NightlyTests-7.2/1/
> ant test  -Dtestcase=TestRandomChains -Dtests.method=testRandomChains 
> -Dtests.seed=C006DAD2E1FC77AF -Dtests.multiplier=2 -Dtests.nightly=true 
> -Dtests.slow=true 
> -Dtests.linedocsfile=/Users/romseygeek/projects/lucene-test-data/enwiki.random.lines.txt
>  -Dtests.locale=tr -Dtests.timezone=Europe/Simferopol -Dtests.asserts=true 
> -Dtests.file.encoding=UTF-8
> Reproduces locally on 7.2



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to