krickert opened a new pull request, #1076: URL: https://github.com/apache/opennlp/pull/1076
## What - `find()` leaked native memory on every call: the `OnnxTensor` inputs and the `OrtSession.Result` were never closed for each sentence chunk. Tensors are now released in a `finally` block and the result via try-with-resources (`getValue()` copies into Java arrays first, so this is safe). - A token missing from the vocabulary caused `vocab.get(...)` to auto-unbox `null` into an `int`, throwing an opaque `NullPointerException`. The mapping loop is now a testable `tokenIds()` helper that throws `IllegalArgumentException` naming the missing token, which indicates the vocabulary file does not match the model. ## Why See [OPENNLP-1840](https://issues.apache.org/jira/browse/OPENNLP-1840). Long-running services calling `find()` repeatedly accumulate off-heap allocations until the process is killed, while the Java heap looks healthy. This completes the resource-management pattern applied to `SentenceVectorsDL` (OPENNLP-1836, #1072) and `DocumentCategorizerDL` (OPENNLP-1839, #1074). ## Notes - `find()` results are unchanged, so the `NameFinderDLEval` expectations are unaffected. - `tokenIds()` intentionally mirrors the helper added to `DocumentCategorizerDL` in #1074; consolidating both into `AbstractDL` is a natural cleanup once both PRs are merged. ## Validation New `NameFinderDLTest` covers the token-id mapping and the vocabulary-miss error. All existing `opennlp-dl` tests pass. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
