[ 
https://issues.apache.org/jira/browse/LUCENE-9588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17222913#comment-17222913
 ] 

Robert Muir commented on LUCENE-9588:
-------------------------------------

Sorry, the JapaneseTokenizer example doesn't hold up: that's comparing apples 
with oranges. It doesn't subclass this class: so of course its incrementToken 
throws IOException: it has to read from Reader... its logic mixes that i/o with 
segmentation.

On the other hand, this subclass (the entire point of it!) is to separate these 
two things. If you want to mix i/o and segmentation (like JapaneseTokenizer, 
doing them in a streaming fashion), then this subclass is simply inappropriate 
and you should just subclass {{Tokenizer}}.

I agree that incrementSentence() should not throw IOException, that's a bug. It 
is an oversight and it gives the wrong impression. We can remove the {{throws 
IOException}} there, it doesn't break any subclasses.


> Exceptions handling in methods of SegmentingTokenizerBase
> ---------------------------------------------------------
>
>                 Key: LUCENE-9588
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9588
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 8.6.3
>            Reporter: Nguyen Minh Gia Huy
>            Priority: Minor
>
> The current interface of *setNextSentence* and *incrementWord* methods in 
> *SegmentingTokenizerBase* do not define the checked exceptions, which makes 
> it troublesome to be inherited.
> For example, if we override the incrementWord  with a logic that invoke  
> incrementToken on another tokenizer, the incrementToken raises the 
> IOException but the incrementWord is not defined to handle it.
> I think having setNextSentence and incrementWord handle the IOException would 
> make the SegmentingTokenizerBase easier to be used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to