krickert opened a new pull request, #1076:
URL: https://github.com/apache/opennlp/pull/1076

   ## What
   
   - `find()` leaked native memory on every call: the `OnnxTensor` inputs and 
the `OrtSession.Result` were never closed for each sentence chunk. Tensors are 
now released in a `finally` block and the result via try-with-resources 
(`getValue()` copies into Java arrays first, so this is safe).
   - A token missing from the vocabulary caused `vocab.get(...)` to auto-unbox 
`null` into an `int`, throwing an opaque `NullPointerException`. The mapping 
loop is now a testable `tokenIds()` helper that throws 
`IllegalArgumentException` naming the missing token, which indicates the 
vocabulary file does not match the model.
   
   ## Why
   
   See [OPENNLP-1840](https://issues.apache.org/jira/browse/OPENNLP-1840). 
Long-running services calling `find()` repeatedly accumulate off-heap 
allocations until the process is killed, while the Java heap looks healthy. 
This completes the resource-management pattern applied to `SentenceVectorsDL` 
(OPENNLP-1836, #1072) and `DocumentCategorizerDL` (OPENNLP-1839, #1074).
   
   ## Notes
   
   - `find()` results are unchanged, so the `NameFinderDLEval` expectations are 
unaffected.
   - `tokenIds()` intentionally mirrors the helper added to 
`DocumentCategorizerDL` in #1074; consolidating both into `AbstractDL` is a 
natural cleanup once both PRs are merged.
   
   ## Validation
   
   New `NameFinderDLTest` covers the token-id mapping and the vocabulary-miss 
error. All existing `opennlp-dl` tests pass.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to