[ https://issues.apache.org/jira/browse/LUCENE-9588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226471#comment-17226471 ]
Nguyen Minh Gia Huy commented on LUCENE-9588: --------------------------------------------- I wonder what should be the appropriate usage of this class ? Let's say I want a Tokenizer that breaks the text into sentences and send each sentence to another tokenizer, for example WhiteSpaceTokenizer, for segmentation.To do so, I would have to make that tokenizer implement the SegmentingTokenizerBase and invoke the WhiteSpaceTokenizer in the *incrementWord* method. WhiteSpaceTokenizer implements the Tokenizer so it throws I/O exception during analysis. How the I/O and segmentation could be separated in such cases ? Is SegmentingTokenizerBase intended to limit the usage for only non-i/o segmentation e.g. [HMMChineseTokenizer|https://github.com/apache/lucene-solr/blob/master/lucene/analysis/smartcn/src/java/org/apache/lucene/analysis/cn/smart/HMMChineseTokenizer.java#L46] splits sentence by WordSegmenter, which don't require I/O handling ? > Exceptions handling in methods of SegmentingTokenizerBase > --------------------------------------------------------- > > Key: LUCENE-9588 > URL: https://issues.apache.org/jira/browse/LUCENE-9588 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis > Affects Versions: 8.6.3 > Reporter: Nguyen Minh Gia Huy > Priority: Minor > > The current interface of *setNextSentence* and *incrementWord* methods in > *SegmentingTokenizerBase* do not define the checked exceptions, which makes > it troublesome to be inherited. > For example, if we override the incrementWord with a logic that invoke > incrementToken on another tokenizer, the incrementToken raises the > IOException but the incrementWord is not defined to handle it. > I think having setNextSentence and incrementWord handle the IOException would > make the SegmentingTokenizerBase easier to be used. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org