[ https://issues.apache.org/jira/browse/LUCENE-8651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16748071#comment-16748071 ]
Alan Woodward commented on LUCENE-8651: --------------------------------------- I think this may be a question of documentation; Tokenizers pull data from a Reader, which can't be reset, so it makes no sense to call reset() on a Tokenizer without calling setReader() first. Generally speaking, the contract is around what a consumer should do with a tokenstream once it has received it - reset(), incrementToken(), end(), close(). What you're trying to do with the ConcatenatingTokenStream doesn't make a lot of sense to me - can you explain what it is that you want to do here? > Tokenizer implementations can't be reset > ---------------------------------------- > > Key: LUCENE-8651 > URL: https://issues.apache.org/jira/browse/LUCENE-8651 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis > Reporter: Dan Meehl > Priority: Major > Attachments: LUCENE-8650-2.patch, LUCENE-8651.patch > > > The fine print here is that they can't be reset without calling setReader() > every time before reset() is called. The reason for this is that Tokenizer > violates the contract put forth by TokenStream.reset() which is the following: > "Resets this stream to a clean state. Stateful implementations must implement > this method so that they can be reused, just as if they had been created > fresh." > Tokenizer implementation's reset function can't reset in that manner because > their Tokenizer.close() removes the reference to the underlying Reader > because of LUCENE-2387. The catch-22 here is that we don't want to > unnecessarily keep around a Reader (memory leak) but we would like to be able > to reset() if necessary. > The patches include an integration test that attempts to use a > ConcatenatingTokenStream to join an input TokenStream with a KeywordTokenizer > TokenStream. This test fails with an IllegalStateException thrown by > Tokenizer.ILLEGAL_STATE_READER. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org