krickert commented on PR #1103:
URL: https://github.com/apache/opennlp/pull/1103#issuecomment-4763023282
**Dimension javadoc forward-references Term/TermAnalyzer.**
`Dimension` references `Term`/`TermAnalyzer` with `{@code}`, not `{@link}`,
so standalone javadoc on this branch produces no unresolved-reference warnings.
**Offset mapping isn't reachable through the builder.**
You found an offset in my impl (pun intended), and the root cause was the
missing composition primitive: there was no way to combine the per-stage offset
maps. I got rid of the `OffsetMapping` and added `Alignment.andThen` so an
offset-carrying pipeline is now possible. Wiring it through
`TextNormalizer.build()` for arbitrary
`CharSequenceNormalizer`s is a follow-up (only the `CharClass`-family
transforms can emit an alignment cheaply; `java.text.Normalizer`-based stages
would need ICU-style edit callbacks), but the primitive it depends on is in
place.
**OffsetMap buffer growth overflows past ~2^30.**
`OffsetMap` is removed. Its replacement, `Alignment.Builder`, grows
overflow-aware
(`length + (length >> 1)`, clamped to `Integer.MAX_VALUE - 8`), so it
degrades to a clean `OutOfMemoryError` instead of `NegativeArraySizeException`.
`WordSegmenter.IntList` got the same treatment (see #1104).
**Confusables.load() has no per-line guard.**
Fixed. The per-line parse is wrapped and rethrows an `IllegalStateException`
naming the offending line number, mirroring `CodePointSet.fromFile`, instead of
surfacing a raw `ExceptionInInitializerError`. (A bundled-file checksum/version
assertion is a reasonable follow-up but is left out here.)
**Nit: serialVersionUID = 1L vs random longs; builder() returns its own
mutable builder.**
Although I'm camp "1L" for various reasons, I don't mind either way.
Changing that now.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]