Hi Kristian, thx for the initiative which I’d like to support hereby. I’ve been 'off in nature' for some days recently and thus my answer is delayed.
A document centric approach is well-motivated in the Jira. For reasons of simplicity (and neutrality) we could add a opennlp-core api interface ’Document’. This would allow us to model what a document is composed of, and (b) for other components to (re-) implement it by related requirements / ideas, such as outlined in OPENNLP-1833 by you („OpenNLPDocument“, „AnalyzeDocument“). If you want a core-api addition, say for ‚Document‘ or the like, keep in mind we can integrate it with the next 3.0.0-M4. If this is not required / necessary in the first place: that is also fine - we can refactor / extract later on. Currently, as is stands, we’re planning to cut a release at the end of June or early July. If you want to start things by Working first, in the opennlp-sandbox and evolving the current state seems reasonable, target being the core project in future cycles. Proposed package naming is fine from my pov, cf. JIRA issue. My views on your questions in the JIRA description: ad (1): go for retain in legacy pkg ad (2): can imagine both paths, more likely is 3.1.x - as it feels 3.0.x is at the door soon (over or at the end of the summer 2026). ad (3): stay with Maven (plz) if this is possible. Personally (!), no a big fan of Gradle… - personally speaking here, no strong opinion Happy about other’s comments. Thanks for the ideas and precise outline of 'em. The direction has a lot of potential. Best Martin | mawiesne > Am 22.05.2026 um 12:27 schrieb Kristian Rickert <[email protected]>: > > Hi OpenNLP devs, > > I've opened OPENNLP-1833 to propose evolving the opennlp-sandbox gRPC > POC into ASF-native modules with a canonical OpenNlpDocument message and > a primary AnalyzeDocument RPC (org.apache.opennlp.grpc.v1). > > JIRA: https://issues.apache.org/jira/browse/OPENNLP-1833 > > Background: OpenNLP today is primarily in-process (API, CLI, UIMA). > The sandbox POC (opennlp-grpc) exposes three separate string-based > services; the ticket proposes a unified document contract and server-side > pipeline orchestration. > > My primary goal is to integrate other language libraries through a gRPC > contract. This will allow the server to work with OpenNLP. OpenNLP can > use the client stubs to get data from the server, and the server would also > use OpenNLP to expose the API to other languages. > > To be more specific: I'd like to introduce options that also utilize the > GPU more directly for embeddings. CUDA for nvidia cards and OpenVINO for > Intel cards. This would create a middle interface that can hot-swap on the > server side. Of course, these interfaces would also be their own builds. > > I'm planning to work on this in phases as outlined in the ticket: > > - Phase 0/1: community RFC + design doc / full .proto definitions > - Phase 2+: implementation (will work on this while we discuss phase 1, > but open for changes) > > I'd appreciate feedback on a few points called out in the JIRA ticket. > > I can get a prototype up within a couple of weeks. > > Sandbox reference: > > https://github.com/apache/opennlp-sandbox/tree/OPENNLP-1833-grpc-expansion > > I'll post design updates and any draft .proto / docs to the ticket. > Comments on the JIRA or replies to this thread are welcome although JIRA is > preferred. > > Thanks, > Kristian
