[ https://issues.apache.org/jira/browse/LUCENE-4170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402307#comment-13402307 ]
Steven Rowe commented on LUCENE-4170: ------------------------------------- bq. I think shingles has a similar bug: it doesn't look at the existing posLength of the input tokens at all, instead it just fills posLength with the builtGramSize. I agree. However, the problem isn't just position length: ShingleFilter has never handled input position increments of zero, so real graph compatibility will mean fixing that too. I think Karl Wettin's ShingleMatrixFilter (deprecated in 3.6, dropped in 4.0) is an attempt to permute all combinations of overlapping (poslength=1) terms to produce shingles. ShingleMatrixFilter wouldn't handle poslength > 1, though. I'm not even sure what token ngramming should mean over an input graph. The trivial case where input tokens' poslength is always zero and position increment is always one is obviously already handled. I think both issues should be handled, since poslength > 1 will very likely be used with posincr = 0, e.g. synonyms and kuromoji de-compounding. > TestRandomChains fail with Shingle+CommonGrams > ---------------------------------------------- > > Key: LUCENE-4170 > URL: https://issues.apache.org/jira/browse/LUCENE-4170 > Project: Lucene - Java > Issue Type: Bug > Components: modules/analysis > Reporter: Robert Muir > Attachments: LUCENE-4170.patch > > > ant test -Dtestcase=TestRandomChains -Dtests.method=testRandomChains > -Dtests.seed=12635ABB4F789F2A -Dtests.multiplier=3 -Dtests.locale=pt > -Dtests.timezone=America/Argentina/Salta -Dargs="-Dfile.encoding=ISO8859-1" > This test has two shinglefilters, then a common-grams filter. I think posLen > impls in commongrams and/or shingle has a bug if the input is already a graph. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org