improve BaseTokenStreamTestCase to test end()
---------------------------------------------

                 Key: LUCENE-2219
                 URL: https://issues.apache.org/jira/browse/LUCENE-2219
             Project: Lucene - Java
          Issue Type: Bug
          Components: Analysis, contrib/analyzers
    Affects Versions: 3.0
            Reporter: Robert Muir


If offsetAtt/end() is not implemented correctly, then there can be problems 
with highlighting: see LUCENE-2207 for an example with CJKTokenizer.

In my opinion you currently have to write too much code to test this.

This patch does the following:
* adds optional Integer finalOffset (can be null for no checking) to 
assertTokenStreamContents
* in assertAnalyzesTo, automatically fill this with the String length()

In my opinion this is correct, for assertTokenStreamContents the behavior 
should be optional, it may not even have a Tokenizer. If you are using 
assertTokenStreamContents with a Tokenizer then simply provide the extra 
expected value to check it.

for assertAnalyzesTo then it is implied there is a tokenizer so it should be 
checked.

the tests pass for core but there are failures in contrib even besides 
CJKTokenizer (apply Koji's patch from LUCENE-2207, it is correct). Specifically 
ChineseTokenizer has a similar problem.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to