[ https://issues.apache.org/jira/browse/LUCENE-4170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402314#comment-13402314 ]
Steven Rowe commented on LUCENE-4170: ------------------------------------- bq. I'm not even sure what token ngramming should mean over an input graph. A thought problem: run ShingleFilter with mingramsize=2, maxgramsize=3, outputUnigrams=true over input {{\[a/1] \[b/1] \[c/1] \[d/1]}} (where {{/n}} indicates poslength = {{n}}, and {{\[a b]}} indicates tokens {{a}} and {{b}} are at the same position; I'll omit the {{\[]}}'s below when only one token is at a given position), then run ShingleFilter again with the same config over the first ShingleFilter's output: {noformat} shinglefilter(min:2,max:3,unigrams:true) with input: a/1 b/1 c/1 d/1 "_" token sep: [a/1 a_b/2 a_b_c/3] [b/1 b_c/2 b_c_d/3] [c/1 c_d/2] d/1 shinglefilter(2,3,unigrams) with shinglefilter output above as input: "=" token sep: [a/1 a_b/2 a_b_c/3 a=b/2 a=b_c/3 a=b_c_d/4 a=b=c/3 a=b=c_d/4 a=b_c=d/4 a_b=c/3 a_b=c_d/4 a_b=c=d/4 a_b_c=d/4] [b/1 b_c/2 b_c_d/3 b=c/2 b=c_d/3 b_c=d/3] [c/1 c_d/2 c=d/2] d/1 {noformat} > TestRandomChains fail with Shingle+CommonGrams > ---------------------------------------------- > > Key: LUCENE-4170 > URL: https://issues.apache.org/jira/browse/LUCENE-4170 > Project: Lucene - Java > Issue Type: Bug > Components: modules/analysis > Reporter: Robert Muir > Attachments: LUCENE-4170.patch > > > ant test -Dtestcase=TestRandomChains -Dtests.method=testRandomChains > -Dtests.seed=12635ABB4F789F2A -Dtests.multiplier=3 -Dtests.locale=pt > -Dtests.timezone=America/Argentina/Salta -Dargs="-Dfile.encoding=ISO8859-1" > This test has two shinglefilters, then a common-grams filter. I think posLen > impls in commongrams and/or shingle has a bug if the input is already a graph. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org