[
https://issues.apache.org/jira/browse/LUCENE-8092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16293791#comment-16293791
]
Uwe Schindler commented on LUCENE-8092:
---------------------------------------
This is indeed a funny edge-case. Maybe the second filter could alternatively
reorder the tokens.
But I am not really sure if this is really a problem in real-world scenarios.
It may affect all tokenfilters (not only those 2) that inject tokens with a
diffent offset and length. I think the above combination is useless in
real-world, but maybe others make sense?
An alternative would be to reorder tokens as last step in the analysis chain
(using a "fixing" token filter). It just reorders all tokens with same position
so their offsets increase?
> TestRandomChains failure
> ------------------------
>
> Key: LUCENE-8092
> URL: https://issues.apache.org/jira/browse/LUCENE-8092
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Alan Woodward
>
> https://builds.apache.org/job/Lucene-Solr-NightlyTests-7.2/1/
> ant test -Dtestcase=TestRandomChains -Dtests.method=testRandomChains
> -Dtests.seed=C006DAD2E1FC77AF -Dtests.multiplier=2 -Dtests.nightly=true
> -Dtests.slow=true
> -Dtests.linedocsfile=/Users/romseygeek/projects/lucene-test-data/enwiki.random.lines.txt
> -Dtests.locale=tr -Dtests.timezone=Europe/Simferopol -Dtests.asserts=true
> -Dtests.file.encoding=UTF-8
> Reproduces locally on 7.2
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]