Traktormaster commented on a change in pull request #1123: LUCENE-9093: Unified 
highlighter with word separator never gives context to the left
URL: https://github.com/apache/lucene-solr/pull/1123#discussion_r361650672
 
 

 ##########
 File path: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/LengthGoalBreakIterator.java
 ##########
 @@ -174,7 +175,48 @@ private int moveToBreak(int idx) { // precondition: idx 
is a known break
   // called at start of new Passage given first word start offset
   @Override
   public int preceding(int offset) {
-    return baseIter.preceding(offset); // no change needed
+    final int fragmentStart = Math.max(baseIter.preceding(offset), 0); // 
convert DONE to 0
+    fragmentEndFromPreceding = baseIter.following(fragmentStart);
+    if (fragmentEndFromPreceding == DONE) {
+      fragmentEndFromPreceding = baseIter.last();
+    }
+    final int centerLength = fragmentEndFromPreceding - fragmentStart;
+    final int extraPrecedingLengthGoal = (int)((lengthGoal - centerLength) * 
fragmentAlignment);
 
 Review comment:
   I have a working version with this new behavior. There is just a single 
assertion that fails 
[here](https://github.com/apache/lucene-solr/blob/e06ad4cfb587e1bb69a80c0b531df0e52c2da89b/solr/core/src/test/org/apache/solr/highlight/TestUnifiedSolrHighlighter.java#L267).
 It's because the `fragsize=60` there means to try and get 60 characters on the 
right side of the match as the default `fragalign` is zero. To pass the test 
the query parameters need to be changed to have 
`hl.fragsize=50&hl.fragalign=0.5`.
   This is not a problem by itself, but there is no way to make the new 
behavior completely backwards-compatible like the previous one.
   
   It could also be a bit jarring to have `fragsize` with a distinctively 
different meaning compared to the other highlighters:
   - For original&fastVector the `fragsize` is the target size of the snippet.
   - For unified the `fragsize` is the size of the contextual text around the 
match. Ie.: the snippet's target size is `fragsize+matchsize`.
   
   I'm still in favor of this change, there's literally half as many calls to 
`preceding()` and `following()` of the underlying BI and it's much clearer as 
well.
   
   Should I change the failing test and push this for review? What about the 
backward-compatibility? I'm not sure how to change the docs or the default 
parameters.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to