[
https://issues.apache.org/jira/browse/LUCENE-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15937734#comment-15937734
]
Amrit Sarkar commented on LUCENE-7729:
--------------------------------------
bq. len > 0 (as a comment) but in all cases you probably mean len > 1?
Yes, that is correct.
bq. Let me give a better example of length 3: aab would fail to match aaab. I
just wrote a test for that to confirm it failed. Here's another example of
length 4 that may be more clear: A separator of acab would fail to be detected
in acacab.
I see. The implemented is flawed, the algorithm I thought is incomplete and
though some minor tweaking will make it work surely. I never considered
repetitive pattern in the separator.
bq. To be clear, I never asked or recommended.
David, I completely understand and aware, I just pointed out the conversation
which motivates me to look into it. I am thankful to you for taking your time
out to provide healthy insights and feedback on the patch. I will not get
discouraged if some of my work doesn't get into the main project, even I want
to contribute which is useful not flawed.
With that, I will check out SimplePatternTokenizer and the Automation part.
Thank you for your time again, really appreciate that. Should I leave this JIRA
as it is? or instead atleast fix the implementation?
> Support for string type separator for CustomSeparatorBreakIterator
> ------------------------------------------------------------------
>
> Key: LUCENE-7729
> URL: https://issues.apache.org/jira/browse/LUCENE-7729
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/highlighter
> Reporter: Amrit Sarkar
> Attachments: LUCENE-7729.patch, LUCENE-7729.patch
>
>
> LUCENE-6485: currently CustomSeparatorBreakIterator breaks the text when the
> _char_ passed is found.
> Improved CustomSeparatorBreakIterator; as it now supports separator of string
> type of arbitrary length.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]