[ 
https://issues.apache.org/jira/browse/LUCENE-4170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402307#comment-13402307
 ] 

Steven Rowe commented on LUCENE-4170:
-------------------------------------

bq. I think shingles has a similar bug: it doesn't look at the existing 
posLength of the input tokens at all, instead it just fills posLength with the 
builtGramSize.

I agree.

However, the problem isn't just position length: ShingleFilter has never 
handled input position increments of zero, so real graph compatibility will 
mean fixing that too.

I think Karl Wettin's ShingleMatrixFilter (deprecated in 3.6, dropped in 4.0) 
is an attempt to permute all combinations of overlapping (poslength=1) terms to 
produce shingles.  ShingleMatrixFilter wouldn't handle poslength > 1, though.

I'm not even sure what token ngramming should mean over an input graph.  The 
trivial case where input tokens' poslength is always zero and position 
increment is always one is obviously already handled.

I think both issues should be handled, since poslength > 1 will very likely be 
used with posincr = 0, e.g. synonyms and kuromoji de-compounding.

                
> TestRandomChains fail with Shingle+CommonGrams
> ----------------------------------------------
>
>                 Key: LUCENE-4170
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4170
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: modules/analysis
>            Reporter: Robert Muir
>         Attachments: LUCENE-4170.patch
>
>
> ant test  -Dtestcase=TestRandomChains -Dtests.method=testRandomChains 
> -Dtests.seed=12635ABB4F789F2A -Dtests.multiplier=3 -Dtests.locale=pt 
> -Dtests.timezone=America/Argentina/Salta -Dargs="-Dfile.encoding=ISO8859-1"
> This test has two shinglefilters, then a common-grams filter. I think posLen 
> impls in commongrams and/or shingle has a bug if the input is already a graph.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to