Robert Muir created LUCENE-4343:
-----------------------------------

             Summary: Clear up more Tokenizer.setReader/TokenStream.reset issues
                 Key: LUCENE-4343
                 URL: https://issues.apache.org/jira/browse/LUCENE-4343
             Project: Lucene - Core
          Issue Type: Task
          Components: modules/analysis
            Reporter: Robert Muir
         Attachments: LUCENE-4343.patch

spinoff from user-list thread.

I think the rename helps, but the javadocs still have problems: they seem to 
only describe a totally wacky case (CachingTokenFilter) and not the normal case.

Ideally setReader would be final I think, but there are a few crazy 
tokenstreams to fix before I could make that work. Would also need something 
hackish so MockTokenizer's state machine is still functional.

But i worked on fixing up the mess in our various tokenstreams, which is easy 
for the most part.

As part of this I found it was really useful in flushing out test bugs (ones 
that dont use MockTokenizer, which they really should), if we can do some 
best-effort exceptions when the consumer is broken and it costs nothing.

For example:
{noformat}
-  private int offset = 0, bufferIndex = 0, dataLen = 0, finalOffset = 0;
+  // note: bufferIndex is -1 here to best-effort AIOOBE consumers that don't 
call reset()
+  private int offset = 0, bufferIndex = -1, dataLen = 0, finalOffset = 0;
{noformat}

I think this is worth exploring more... this was really effective at finding 
broken tests etc. We should see if we can be more thorough/ideally throw better 
exceptions when consumers are broken and its free.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to