[ 
https://issues.apache.org/jira/browse/LUCENE-7620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15806526#comment-15806526
 ] 

David Smiley edited comment on LUCENE-7620 at 1/7/17 2:21 AM:
--------------------------------------------------------------

bq. (Tim) For the following method, does it make sense to return the baseIter 
if the followingIdx < startIndex? Maybe throw an exception instead or just have 
an assert that it's less?

It's already asserting (line 124).  Or I'm not understanding you.

bq.  (Tim) This is subjective, but I find it's more useful to break out the 
different tests with methods for each condition. For example: breakAtGoal, 
breakLessThanGoal, breakMoreThanGoal, breakGoalPlusRandom, etc. Similar for the 
defaultSummary tests. This helps when coming back to the test and helps tease 
apart if one piece of functionality is broken vs another.

Fair point.  A better compromise in my mind that is not as verbose as your 
suggestion is to use the "message" parameter of the assert methods.  I will do 
this and upload a new patch tonight.

bq.  (Jim) ... but I wonder if the logic to get the boundary could not be 
simplified. Isn't it possible to always invoke baseIter.preceding(targetIdx) 
and based on isMinimumSize return current() or baseIter.next() ?

No; I don't think so. If one looks at {{preceding(target)}}, you don't know if 
it's result is closer to the target than the following break or not.  The 
"target" mode of this BI gets the _closest_ break.  Come to think of it, maybe 
I should rename {{createTargetLength}} to be {{createClosestToLength}}.  At 
least it's javadocs are already clear I think?


was (Author: dsmiley):
bq. (Tim) For the following method, does it make sense to return the baseIter 
if the followingIdx < startIndex? Maybe throw an exception instead or just have 
an assert that it's less?

It's already asserting (line 124).  Or I'm not understanding you.

bq.  (Tim) This is subjective, but I find it's more useful to break out the 
different tests with methods for each condition. For example: breakAtGoal, 
breakLessThanGoal, breakMoreThanGoal, breakGoalPlusRandom, etc. Similar for the 
defaultSummary tests. This helps when coming back to the test and helps tease 
apart if one piece of functionality is broken vs another.

Fair point.  A better compromise in my mind that is not as verbose as your 
suggestion is to use the "message" parameter of the assert methods.  I will do 
this and upload a new patch tonight.

bq.  (Jim) ... but I wonder if the logic to get the boundary could not be 
simplified.
Isn't it possible to always invoke baseIter.preceding(targetIdx) and based on 
isMinimumSize return current() or baseIter.next() ?

No; I don't think so. If one looks at {{preceding(target)}}, you don't know if 
it's result is closer to the target than the following break or not.  The 
"target" mode of this BI gets the _closest_ break.  Come to think of it, maybe 
I should rename {{createTargetLength}} to be {{createClosestToLength}}.  At 
least it's javadocs are already clear I think?

> UnifiedHighlighter: add target character width BreakIterator wrapper
> --------------------------------------------------------------------
>
>                 Key: LUCENE-7620
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7620
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/highlighter
>            Reporter: David Smiley
>            Assignee: David Smiley
>             Fix For: 6.4
>
>         Attachments: LUCENE_7620_UH_LengthGoalBreakIterator.patch, 
> LUCENE_7620_UH_LengthGoalBreakIterator.patch
>
>
> The original Highlighter includes a {{SimpleFragmenter}} that delineates 
> fragments (aka Passages) by a character width.  The default is 100 characters.
> It would be great to support something similar for the UnifiedHighlighter.  
> It's useful in its own right and of course it helps users transition to the 
> UH.  I'd like to do it as a wrapper to another BreakIterator -- perhaps a 
> sentence one.  In this way you get back Passages that are a number of 
> sentences so they will look nice instead of breaking mid-way through a 
> sentence.  And you get some control by specifying a target number of 
> characters.  This BreakIterator wouldn't be a general purpose 
> java.text.BreakIterator since it would assume it's called in a manner exactly 
> as the UnifiedHighlighter uses it.  It would probably be compatible with the 
> PostingsHighlighter too.
> I don't propose doing this by default; besides, it's easy enough to pick your 
> BreakIterator config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to