[
https://issues.apache.org/jira/browse/LUCENE-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15952490#comment-15952490
]
Amrit Sarkar commented on LUCENE-7729:
--------------------------------------
:)
I looked into the SimplePatternTokenizer and how it does the pattern matching
utilising finite-state deterministic automata. CharacterRunAutomaton is the one
fundamental for the hypothetical PatternBreakIterator. It should not be much
work considering everything has been implemented very extensively and
SimplePatternTokenizer provides a perfect example. I will try to devise
something out of it and update soon.
> Support for string type separator for CustomSeparatorBreakIterator
> ------------------------------------------------------------------
>
> Key: LUCENE-7729
> URL: https://issues.apache.org/jira/browse/LUCENE-7729
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/highlighter
> Reporter: Amrit Sarkar
> Attachments: LUCENE-7729.patch, LUCENE-7729.patch
>
>
> LUCENE-6485: currently CustomSeparatorBreakIterator breaks the text when the
> _char_ passed is found.
> Improved CustomSeparatorBreakIterator; as it now supports separator of string
> type of arbitrary length.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]