[ 
https://issues.apache.org/jira/browse/LUCENE-7620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15805199#comment-15805199
 ] 

Jim Ferenczi commented on LUCENE-7620:
--------------------------------------

{quote}
By choosing a lengthGoal on the low side; maybe "too long" will tend not to be 
a problem? Or see my TODO at the top of the file – essentially choose the break 
that is closest to the goal instead of always the first following it.
{quote}

Yeah depends how the lengthGoal is perceived. I was looking at it as a boundary 
mainly to solve "too long" fragment. And this issue is more about "too short" 
fragments. Maybe a different issue then but I am just afraid that we'll end up 
with multiple public break iterator impls that must follow a specific pattern 
to be used.
Anyway this patch is a start to get better highlighting through custom break 
iterator and it solves a real issue. Please push to 6.4 if you think it's 
ready, we can always discuss the next steps in a follow up. 
Regarding the assertion I prefer an IllegalStateException with a clear message 
but I am maybe too paranoid.






> UnifiedHighlighter: add target character width BreakIterator wrapper
> --------------------------------------------------------------------
>
>                 Key: LUCENE-7620
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7620
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/highlighter
>            Reporter: David Smiley
>            Assignee: David Smiley
>         Attachments: LUCENE_7620_UH_LengthGoalBreakIterator.patch
>
>
> The original Highlighter includes a {{SimpleFragmenter}} that delineates 
> fragments (aka Passages) by a character width.  The default is 100 characters.
> It would be great to support something similar for the UnifiedHighlighter.  
> It's useful in its own right and of course it helps users transition to the 
> UH.  I'd like to do it as a wrapper to another BreakIterator -- perhaps a 
> sentence one.  In this way you get back Passages that are a number of 
> sentences so they will look nice instead of breaking mid-way through a 
> sentence.  And you get some control by specifying a target number of 
> characters.  This BreakIterator wouldn't be a general purpose 
> java.text.BreakIterator since it would assume it's called in a manner exactly 
> as the UnifiedHighlighter uses it.  It would probably be compatible with the 
> PostingsHighlighter too.
> I don't propose doing this by default; besides, it's easy enough to pick your 
> BreakIterator config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to