[
https://issues.apache.org/jira/browse/LUCENE-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15937656#comment-15937656
]
David Smiley commented on LUCENE-7729:
--------------------------------------
BTW in multiple places in your code as comments plus this issue commentary I've
seen this: {{len > 0}} (as a comment) but in all cases you probably mean {{len
> 1}}?
RE resetting a failed match: Good point that your patch addresses the specific
example I gave, and apparently any separator of length 2. Let me give a better
example of length 3: {{aab}} would fail to match {{aaab}}. I just wrote a
test for that to confirm it failed. Here's another example of length 4 that
may be more clear: A separator of {{acab}} would fail to be detected in
{{acacab}}.
testBreakOnCustomSeparator: you commented out a couple assertions because they
didn't apply if the separator is > 1 length. Instead you could add a condition
to only test when length 1.
RE my proposed single char constructor org: this is just syntactic sugar (i.e.
convenience). A bunch of changes in your diff would then be able to stay the
same.
bq. I observed the code and understood it will not require major refactoring to
change the current implementation for arbitrary length string.
Yeah I figured. I envy the time you have on your hands to implement a feature
that nobody has (yet) asked for :-) To be clear, I never asked or recommended.
I sometimes work on something fun to me too; scratch some itch.
Speaking of scratching itches... check out SimplePatternTokenizer (recently
added to Lucene) and how it works with an Automaton. Now I'm sure *that* would
be useful to users; the original Highlighter (via Solr at least) had a regexp
passage splitter. One possible direction you might take is to leave
CustomSeparatorBreakIterator be and instead do one taking a regexp/automaton...
and then if some user wants to split on a string then they could use this guy.
> Support for string type separator for CustomSeparatorBreakIterator
> ------------------------------------------------------------------
>
> Key: LUCENE-7729
> URL: https://issues.apache.org/jira/browse/LUCENE-7729
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/highlighter
> Reporter: Amrit Sarkar
> Attachments: LUCENE-7729.patch, LUCENE-7729.patch
>
>
> LUCENE-6485: currently CustomSeparatorBreakIterator breaks the text when the
> _char_ passed is found.
> Improved CustomSeparatorBreakIterator; as it now supports separator of string
> type of arbitrary length.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]