rzo1 opened a new pull request, #985:
URL: https://github.com/apache/opennlp/pull/985

   Note: Requires https://github.com/apache/opennlp/pull/984 first.
   
     - Fix multi-period skip logic in `sentPosDetect` incorrectly skipping real 
sentence boundaries when the next sentence starts immediately with an       
     abbreviation (e.g., "Gedanken.Bek."). The skip is now bypassed when the 
character after the delimiter is uppercase, indicating a new sentence rather   
     than a multi-period abbreviation like "z.B."                               
                                                                            
     - Generalize abbreviation-at-segment-start check in `isAcceptableBreak` 
from `tokenStartPos == 0` to `tokenStartPos == fromIndex`, so abbreviations are
      correctly recognized at segment boundaries, not just at the start of the 
entire text  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to