krickert commented on PR #1103:
URL: https://github.com/apache/opennlp/pull/1103#issuecomment-4764287154

   **Status since the last review.** Offset-model items addressed; additive 
commits, so inline threads stay anchored.
   
   - `buildAligned()` + `OffsetAwareNormalizer` give the `*Aligned` API a real 
consumer: every per-code-point fold (whitespace, line-break-preserving 
whitespace, dashes, invisible-strip, quotes, digits, ellipsis, bullets, umlaut) 
composes into one `AlignedText`. Folds that route through 
`java.text.Normalizer` or JDK case mapping are rejected loudly, naming the rung.
   - Capital eszett U+1E9E folds to `SS`. `buildAligned()` reject text states 
the rule instead of a stale list. `Confusables` javadoc scoped to the skeleton 
plus equality test (restriction-level, mixed-script, bidi out of scope). Empty 
aligned pipeline normalizes input to one `String`.
   - `Alignment.andThen` leading-insertion is not a bug: `Math.max(start, end)` 
already yields the zero-width span. Added a test that proves it.
   - New tests: CharClass plain-vs-aligned parity battery, leading-insertion 
compose, capital-eszett offsets, `buildAligned()` rejection at every index and 
fold type, `toNormalizedSpan` no over-cover across deletions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to