inconsistency of tokenstream.end() with OffsetLimitTokenFilter and LimitTokenCountFilter ----------------------------------------------------------------------------------------
Key: LUCENE-3088 URL: https://issues.apache.org/jira/browse/LUCENE-3088 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir In LUCENE-3064, we added some state and checks to MockTokenizer to validate that consumers are properly using the tokenstream workflow (described here: http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/analysis/TokenStream.html) One inconsistency is the following steps: 4. The consumer calls incrementToken() until it returns false consuming the attributes after each call. 5. The consumer calls end() so that any end-of-stream operations can be performed. In the case of these limitingfilters, end() is called on the Tokenizer *before* incrementToken() returns false. This is a little strange for a few reasons: one is that the tokenizer might not even be "ready" for end(), e.g. it might be coded where end() only works correctly if its entirely consumed. The other problem of course is that the finalOffset, the general use of end(), will most often be wrong in this case, so multi-valued field highlighting will not work. We should probably figure out a way to address the inconsistency, some ideas are: # fixing the javadocs, perhaps documenting that end() could be called at any time, and accepting the fact that the finalOffset will be wrong. # the limiting filters could consume the rest of the tokens in a while (incrementToken()) loop to ensure totally proper behavior. # the limiting filters could do something tricky like override end() so that its not invoked on the Tokenizer in a surprising state. This is still evil but perhaps less evil than calling it "out of order". # ... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org