[ https://issues.apache.org/jira/browse/LUCENE-7708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15883671#comment-15883671 ]
ASF subversion and git services commented on LUCENE-7708: --------------------------------------------------------- Commit 6c63df0b15f735907438514f3b4b702680d74588 in lucene-solr's branch refs/heads/branch_6x from [~jim.ferenczi] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6c63df0 ] LUCENE-7708: Fix position length attribute set by the ShingleFilter when outputUnigrams=false > Track PositionLengthAttribute abuse > ----------------------------------- > > Key: LUCENE-7708 > URL: https://issues.apache.org/jira/browse/LUCENE-7708 > Project: Lucene - Core > Issue Type: Bug > Components: core/queryparser, modules/analysis > Reporter: Jim Ferenczi > Attachments: LUCENE-7708.patch, LUCENE-7708.patch > > > Some token filters uses the position length attribute of the token stream to > encode the number of terms they put in a single token. > This breaks the query parsing because it creates disconnected graph. > I've tracked down the abusive case to 2 candidates: > * ShingleFilter which sets the position length attribute to the length of the > shingle. > * CJKBigramFilter which always sets the position length attribute to 2. > I don't think these filters should set the position length at all so the best > would be to remove the attribute from these token filters but this could > break BWC. > Though this is a serious bug since shingles and cjk bigram now produce > invalid queries. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org