[ 
https://issues.apache.org/jira/browse/LUCENE-8651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16746759#comment-16746759
 ] 

Daniel Meehl edited comment on LUCENE-8651 at 1/19/19 12:58 AM:
----------------------------------------------------------------

As a little more of an explanation, all I did here was to replace the 
KeywordTokenStream (from the 1st patch) to a KeywordTokenizer. This causes the 
test to fail with an IllegalStateException because the KeywordTokenizer has 
it's close() and then reset() methods called which swaps out the previously set 
reader for the Tokenizer.ILLEGAL_STATE_READER.


was (Author: dmeehl):
As a little more of an explanation, all I did here was to replace the 
KeywordTokenStream (from the 1st patch) to a KeywordTokenizer. This causes the 
test to fail with an IllegalStateException because the KeywordTokenizer has 
it's end and then reset methods called which swaps out the previously set 
reader for the Tokenizer.ILLEGAL_STATE_READER.

> Tokenizer implementations can't be reset
> ----------------------------------------
>
>                 Key: LUCENE-8651
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8651
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/analysis
>            Reporter: Daniel Meehl
>            Priority: Major
>         Attachments: LUCENE-8650-2.patch, LUCENE-8651.patch
>
>
> The fine print here is that they can't be reset without calling setReader() 
> every time before reset() is called. The reason for this is that Tokenizer 
> violates the contract put forth by TokenStream.reset() which is the following:
> "Resets this stream to a clean state. Stateful implementations must implement 
> this method so that they can be reused, just as if they had been created 
> fresh."
> Tokenizer implementation's reset function can't reset in that manner because 
> their Tokenizer.close() removes the reference to the underlying Reader 
> because of LUCENE-2387. The catch-22 here is that we don't want to 
> unnecessarily keep around a Reader (memory leak) but we would like to be able 
> to reset() if necessary.
> The patches include an integration test that attempts to use a 
> ConcatenatingTokenStream to join an input TokenStream with a KeywordTokenizer 
> TokenStream. This test fails with an IllegalStateException thrown by 
> Tokenizer.ILLEGAL_STATE_READER.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to