rzo1 commented on PR #1105:
URL: https://github.com/apache/opennlp/pull/1105#issuecomment-4762681049

   Thx for the PR. Here are some suggestions:
   
   - opennlp-dl now compile-depends on all of opennlp-runtime (the pom promotes 
it from test to compile) solely to reach CharClass. Before this, the module 
compiled against opennlp-api plus onnxruntime only; it now transitively bundles 
the whole tools runtime.
   - @Deprecated on NameFinderDL.find(String[]) is too broad. It's the 
canonical TokenNameFinder entry point used by every existing caller, but it 
only returns wrong offsets when normalizeDashes=true and the input contains a 
supplementary-plane dash (whitespace folding is length-preserving, so find() 
stays correct there). As written, every caller gets a deprecation warning for 
an edge case that can't affect them. Suggest dropping @Deprecated, documenting 
the caveat in javadoc, and delegating the mapping (or logging once) only when a 
length-changing fold is active.
   - findInOriginal is only on the concrete type, not on TokenNameFinder, so 
interface-typed callers (UIMA, eval harness) can't reach the offset-correct 
path while the interface path is deprecated. Consider exposing an offset-mapped 
variant on the interface, or document the asymmetry.
   - Missing end-to-end test. The full normalize to decode to map-back path 
with a length-changing fold active is only covered piecemeal. Add one test 
asserting a span from findInOriginal covers the correct original text when 
normalizeDashes is on with a non-BMP dash. Also, NameFinderDLEval still calls 
the now-deprecated find(...).
   - Nit: README and javadoc say dash folding is "offset preserving for BMP". 
Worth stating the stronger guarantee that findInOriginal stays correct even for 
non-BMP dashes via the offset map.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to