rzo1 commented on PR #1105: URL: https://github.com/apache/opennlp/pull/1105#issuecomment-4762681049
Thx for the PR. Here are some suggestions: - opennlp-dl now compile-depends on all of opennlp-runtime (the pom promotes it from test to compile) solely to reach CharClass. Before this, the module compiled against opennlp-api plus onnxruntime only; it now transitively bundles the whole tools runtime. - @Deprecated on NameFinderDL.find(String[]) is too broad. It's the canonical TokenNameFinder entry point used by every existing caller, but it only returns wrong offsets when normalizeDashes=true and the input contains a supplementary-plane dash (whitespace folding is length-preserving, so find() stays correct there). As written, every caller gets a deprecation warning for an edge case that can't affect them. Suggest dropping @Deprecated, documenting the caveat in javadoc, and delegating the mapping (or logging once) only when a length-changing fold is active. - findInOriginal is only on the concrete type, not on TokenNameFinder, so interface-typed callers (UIMA, eval harness) can't reach the offset-correct path while the interface path is deprecated. Consider exposing an offset-mapped variant on the interface, or document the asymmetry. - Missing end-to-end test. The full normalize to decode to map-back path with a length-changing fold active is only covered piecemeal. Add one test asserting a span from findInOriginal covers the correct original text when normalizeDashes is on with a non-BMP dash. Also, NameFinderDLEval still calls the now-deprecated find(...). - Nit: README and javadoc say dash folding is "offset preserving for BMP". Worth stating the stronger guarantee that findInOriginal stays correct even for non-BMP dashes via the offset map. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
