[ 
https://issues.apache.org/jira/browse/LUCENE-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15952490#comment-15952490
 ] 

Amrit Sarkar commented on LUCENE-7729:
--------------------------------------

:)

I looked into the SimplePatternTokenizer and how it does the pattern matching 
utilising finite-state deterministic automata. CharacterRunAutomaton is the one 
fundamental for the hypothetical PatternBreakIterator. It should not be much 
work considering everything has been implemented very extensively and 
SimplePatternTokenizer provides a perfect example. I will try to devise 
something out of it and update soon.

> Support for string type separator for CustomSeparatorBreakIterator
> ------------------------------------------------------------------
>
>                 Key: LUCENE-7729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7729
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/highlighter
>            Reporter: Amrit Sarkar
>         Attachments: LUCENE-7729.patch, LUCENE-7729.patch
>
>
> LUCENE-6485: currently CustomSeparatorBreakIterator breaks the text when the 
> _char_ passed is found.
> Improved CustomSeparatorBreakIterator; as it now supports separator of string 
> type of arbitrary length.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to