Hi Martin, +1 (non-binding)
OPENNLP-1850 is ready for review. If we can get it approved for M4, I can integrate the gRPC server from the sandbox with it faster. However, this is not strictly necessary for this release. Second suggestion: could also consider a quick turnaround for an M5 if M4 goes well? I am fine with proceeding either way. Best regards, Kristian Rickert On Thu, Jun 25, 2026 at 7:58 AM Martin Wiesner <[email protected]> wrote: > Hi all, > > I have posted a first release candidate for the Apache OpenNLP 3.0.0-M4 > release and it is ready for testing. > > The 3.x release line of Apache OpenNLP introduces no known breaking > changes while significantly modularizing the project to improve library > usage and future extensibility. > The core API remains stable and fully compatible with 2.x, so existing > projects can continue using the opennlp-tools artifact without > modifications. > > Key Highlights: > • New Features: > • Include list of stop words for various languages (OPENNLP-660) > • Add SymSpell-based spell correction component (OPENNLP-1832) > • Add BertTokenizer with BERT basic tokenization (OPENNLP-1837) > • Bug Fixes: > • This release ships four bug fixes for: OPENNLP-1826, > OPENNLP-1836, OPENNLP-1839, and OPENNLP-1840 > • Improvements: > • Harden SvmDoccatModel.deserialize() with ObjectInputFilter and > resource limits (OPENNLP-1823) > • Tolerate unsupported XML parser security options (OPENNLP-1835) > • Fix NameFinderDL only worked with Person, expand to all types > (OPENNLP-1846) > • Several updates of dependencies were conducted, see Jira release > notes listing - URL down below > • Some minor tasks have been completed > • IMPORTANT Changes: > • The ONNX input encoding in SentenceVectorsDL was fixed, which > changes the produced sentence vectors. Any embeddings persisted with the > old encoding are not comparable to the new output and must be re-generated. > (OPENNLP-1836 - PR #1072) > • WordpieceTokenizer (public API, used by opennlp-dl) now splits > punctuation runs into single tokens, collapses partially-matched words to a > single [UNK], and throws from tokenizePos instead of returning null. These > change tokenization output for existing callers. (OPENNLP-1837 - PR #1073) > • NameFinderDL now decodes all BIO entity types (PER/ORG/LOC/…) > instead of only persons. Span.getType() now returns the entity label rather > than the covered text, which is a contract change for existing callers. > (OPENNLP-1846 - PR #1086) > • The opennlp-dl components are now thread-safe; as part of this, > loadVocab became public static (source- and binary-incompatible) and > AbstractDL's implicit no-arg constructor was removed. Both affect > downstream code that calls loadVocab or extends AbstractDL. (OPENNLP-1844 - > PR #1084) > > Thank you to everyone who contributed to this release, including all of > our users and the people who submitted bug reports, contributed code or > documentation enhancements. > > The release was made using the OpenNLP release process, documented on the > website: > https://opennlp.apache.org/release.html > > Maven Repo: > https://repository.apache.org/content/repositories/orgapacheopennlp-1070 > > <repositories> > <repository> > <id>opennlp-3.0.0-M4-RC1</id> > <name>Testing OpenNLP 3.0.0-M4 release candidate</name> > <url> > https://repository.apache.org/content/repositories/orgapacheopennlp-1070 > </url> > </repository> > </repositories> > > Binaries & Source: > > https://dist.apache.org/repos/dist/dev/opennlp/opennlp-3.0.0-M4-rc1/ > > Tag: > > https://github.com/apache/opennlp/releases/tag/opennlp-3.0.0-M4 > > Tag Hash: 1e05d1ef5a7c35b83015ebce87bb9a43c55e2226 > > Release notes: > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12311215&version=12356941 > > The results of the eval tests for the aforementioned tag can be found > here: https://ci-builds.apache.org/job/OpenNLP/job/eval-tests-releases/35/ > > Reminder: The up-2-date KEYS file for signature verification can be > found here: https://dist.apache.org/repos/dist/release/opennlp/KEYS > > Checklist for reference: > > [ ] Both source (tar.gz/zip) and binary artifacts (tar.gz/zip) are > present, along with .asc and .sha512 files for each. > [ ] PGP signatures are valid for the release artifacts using the KEYS file > from dist.apache.org > [ ] SHA512 checksums are correct and verified. > [ ] LICENSE and NOTICE files exist and are accurate. > [ ] No unexpected binary files in the source release. > [ ] All source files have appropriate ASF headers (excluding generated > files and legacy files). > [ ] Build completes successfully from source and the instruction to do so > are clear. > > Please vote on releasing these packages as Apache OpenNLP 3.0.0-M4 > The vote is open for at least the next 72 hours. > > Only votes from OpenNLP PMC are binding, but everyone is welcome to > check the release candidate and vote. > The vote passes if at least three binding +1 votes are cast. > > Please VOTE > > [+1] go ship it > [+0] meh, don't care > [-1] stop, there is a ${showstopper} > > Thanks! > Martin | mawiesne
