[jira] Commented: (LUCENE-2014) position increment bug: smartcn

Robert Muir (JIRA) Thu, 29 Oct 2009 02:09:24 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771343#action_12771343
 ]


Robert Muir commented on LUCENE-2014:
-------------------------------------

Uwe exactly. so only remaining question is, do you think I should change this 
filter to use capture/restoreState api instead of using clearAttributes?

I guess the only advantage would be that it would preserve any customAttributes 
or payloads that someone might add after the SentenceTokenizer, but before the 
WordTokenFilter propagating them downto the individual words.


> position increment bug: smartcn
> -------------------------------
>
>                 Key: LUCENE-2014
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2014
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>         Attachments: LUCENE-2014.patch, LUCENE-2014.patch
>
>
> If i use LUCENE_VERSION >= 2.9 with smart chinese analyzer, it will crash 
> indexwriter with any reasonable amount of chinese text.
> its especially annoying because it happens in 2.9.1 RC as well.
> this is because the position increments for tokens after stopwords are bogus:
> Here's an example (from test case), where the position increment should be 2, 
> but is instead 91975314!
> {code}
>   public void testChineseStopWords2() throws Exception {
>     Analyzer ca = new SmartChineseAnalyzer(Version.LUCENE_CURRENT); /* will 
> load stopwords */
>     String sentence = "Title:San"; // : is a stopword
>     String result[] = { "titl", "san"};
>     int startOffsets[] = { 0, 6 };
>     int endOffsets[] = { 5, 9 };
>     int posIncr[] = { 1, 2 };
>     assertAnalyzesTo(ca, sentence, result, startOffsets, endOffsets, posIncr);
>   }
> {code}
> junit.framework.AssertionFailedError: posIncrement 1 expected:<2> but 
> was:<91975314>
>       at junit.framework.Assert.fail(Assert.java:47)
>       at junit.framework.Assert.failNotEquals(Assert.java:280)
>       at junit.framework.Assert.assertEquals(Assert.java:64)
>       at junit.framework.Assert.assertEquals(Assert.java:198)
>       at 
> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:83)
>       ...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2014) position increment bug: smartcn

Reply via email to