Jim Ferenczi created LUCENE-7708:
------------------------------------
Summary: Track PositionLengthAttribute abuse
Key: LUCENE-7708
URL: https://issues.apache.org/jira/browse/LUCENE-7708
Project: Lucene - Core
Issue Type: Bug
Components: core/queryparser, modules/analysis
Reporter: Jim Ferenczi
Some token filters uses the position length attribute of the token stream to
encode the number of terms they put in a single token.
This breaks the query parsing because it creates disconnected graph.
I've tracked down the abusive case to 2 candidates:
* ShingleFilter which sets the position length attribute to the length of the
shingle.
* CJKBigramFilter which always sets the position length attribute to 2.
I don't think these filters should set the position length at all so the best
would be to remove the attribute from these token filters but this could break
BWC.
Though this is a serious bug since shingles and cjk bigram now produce invalid
queries.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]