Re: [PR] OPENNLP-1845 - Fix numerically unstable softmax in DocumentCategorizerDL (opennlp)

via GitHub Sun, 14 Jun 2026 12:02:54 -0700


krickert commented on PR #1085:
URL: https://github.com/apache/opennlp/pull/1085#issuecomment-4702739431


   ## Summary
   
   On an inference failure the previous code returned an all-zero `double[]`. 
That isn't a valid probability distribution (it doesn't sum to 1), so any 
downstream `getBestCategory` / thresholding silently picks garbage and the real 
failure travels far from its cause.
   
   `categorize(...)` now fails loudly, and distinguishes the *kind* of failure 
instead of lumping everything into one method-wide `catch (Exception)`:
   
   - **Malformed input** (`strings` null or empty) throws 
`IllegalArgumentException`, validated up front.
   - **Inference failure** (an `OrtException`, or any runtime fault while 
executing the model) throws `IllegalStateException` with the cause preserved. 
The model execution is extracted into a private `infer(...)` helper so the wrap 
is scoped to it, not the whole method.
   - **Unexpected model output shape** throws  its own `IllegalStateException`, 
surfaced on its own rather than being re-wrapped as an "inference failed" cause.
   
   `scoreMap` / `sortedScoreMap` inherit this, since they delegate to 
`categorize`.
   
   ## Tests
   
   - **softmax**: uniform distribution for equal logits, finiteness for large 
logits (the previous code returned `NaN`), and a reference distribution 
(`softmax([1,2,3])`).
   - **fail-loud**: `categorize`, `scoreMap`, and `sortedScoreMap` surface an 
`IllegalStateException` on inference failure; malformed input is rejected with 
`IllegalArgumentException`.
   - **eval**: `DocumentCategorizerDLEval#categorizeFailsLoudlyOnFailure` 
covers the contract end-to-end without requiring `OPENNLP_DATA_DIR`.
   
   ## Verification
   
   ```
   ./mvnw -pl opennlp-core/opennlp-ml/opennlp-dl test
   # Tests run: 35, Failures: 0, Errors: 0, Skipped: 0 — BUILD SUCCESS
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] OPENNLP-1845 - Fix numerically unstable softmax in DocumentCategorizerDL (opennlp)

Reply via email to