Hi Jobin, I would love to see a Python interface for OpenNLP, whether it is via gRPC or a native wrapper. I don't think I have any strong feelings toward one more than the other. Perhaps others can weigh in.
OpenNLP saw a significant decrease in its user and developer communities when most of NLP moved to Python a few years back. However, it remains a very capable library and I think having easy access to it from Python would benefit the NLP community. Regardless of which approach is chosen, I think this would be a great submission for Apache's Community over Code NA conference in September, assuming the conference would fit your schedule and travel requirements. The CFP is open until April 21. I think the other Apache Community over Code conferences have their agendas already set for this year. https://communityovercode.org/call-for-presentations/ Thanks, Jeff On Wed, Mar 12, 2025 at 8:53 AM Richard Zowalla <r...@apache.org> wrote: > Hi, > > Yes. You summarized it correctly. > > The following services are currently implemented: > > - Sentence Detection > - Tokenization > - POS Tagging > > The rest of your proposal sounds valid to me. > > Currently, we have some ongoing research regarding the performance of the > gRPC implementation at our university by a student. > That might give additional insights in the next weeks / months. > > Gruß > Richard > > > Am 10.03.2025 um 14:59 schrieb Jobin Sabu <85jobins...@gmail.com>: > > > > Dear Richard and Apache OpenNLP Developers > > > > Thank you, Richard, for your valuable feedback and for pointing me to the > > gRPC work in the sandbox. I’ve taken a closer look at the repository and > > gained a better understanding of the current implementation. The concept > of > > using gRPC to enable backend interactions with OpenNLP is fascinating, > and > > I can see how this approach can benefit developers across multiple > > languages. > > > > Based on my understanding, the sandbox already includes: > > 1. A gRPC schema for OpenNLP services with generated Java stubs. > > 2. A server implementation supporting tasks like POS tagging. > > 3. An example Python client for interacting with the server. > > > > I find the idea of building on this foundation exciting. For my GSoC 2025 > > project, I’d like to propose focusing on **extending the gRPC approach**, > > specifically by: > > - Improving the Python client and packaging it into a library for > > distribution via `pip`, making it easier for Python developers to > integrate > > OpenNLP into their workflows. > > - Exploring additional OpenNLP features (e.g., Named Entity Recognition > or > > Sentence Detection) that can be added to the gRPC service. > > - Enhancing documentation and providing real-world examples for > > Python-based integrations. > > > > Alternatively, if the community sees more value in pursuing a native > Python > > wrapper, I’m open to exploring that as well. My primary goal is to align > my > > efforts with OpenNLP’s priorities and deliver something valuable for the > > community. > > > > I’d love to hear your thoughts and suggestions on this approach. If there > > are specific areas the community would like me to focus on, please let me > > know so I can refine my proposal accordingly. > > > > Thank you again for your guidance and support. I’m eager to hear your > > feedback and take the next steps toward preparing my GSoC application. > > > > **Best regards,** > > Jobin Sabu > > Email: 85jobins...@gmail.com > > > > On Mon, 10 Mar, 2025, 1:59 pm Richard Zowalla, <r...@apache.org> wrote: > > > >> Hi Jobin, > >> > >> Thanks for your interest in contributing to OpenNLP! > >> > >> You’re absolutely right—most existing Python wrappers are either > outdated > >> or unmaintained, so this is a valuable idea in general. That said, there > >> has been some work in the sandbox to demonstrate OpenNLP as a gRPC > service: > >> https://github.com/apache/opennlp-sandbox/tree/main/opennlp-grpc > >> > >> With this approach, a Python client can be generated (and perhaps also > put > >> into pip) to communicate with an OpenNLP server. It might be worth > >> exploring whether extending or improving this setup aligns with your > goals. > >> > >> While a native Python wrapper is certainly an option, the gRPC approach > in > >> the sandbox is another viable path. I’d love to hear thoughts from > others > >> on this as well! WDYT? > >> Gruß > >> Richard > >> > >> > >>> Am 08.03.2025 um 08:53 schrieb Jobin Sabu <85jobins...@gmail.com>: > >>> > >>> Dear OpenNLP Community, > >>> > >>> My name is Jobin Sabu, and I’m a student with a background in Python, > >>> machine learning, and NLP. I’m excited about the opportunity to > >>> participate in Google Summer of Code (GSoC) 2025 with Apache OpenNLP > >>> and contribute to its development. > >>> > >>> I’d like to propose a project idea: developing a Python wrapper for > >>> Apache OpenNLP. The goal is to make OpenNLP’s powerful Java-based NLP > >>> features (e.g., tokenization, sentence detection, named entity > >>> recognition) accessible to Python developers. This wrapper would > >>> bridge Python and Java using libraries like JPype or Py4J, providing a > >>> user-friendly interface and a pip-installable package. > >>> > >>> Here’s an outline of the project: > >>> 1. Implement Python functions that map to OpenNLP’s core features. > >>> 2. Ensure seamless interoperability between Python and Java. > >>> 3. Develop detailed documentation, tutorials, and example scripts. > >>> 4. Write unit tests for robustness and performance benchmarks. > >>> > >>> I believe this project will expand OpenNLP’s usability and attract > >>> more developers from the Python community. I’d love to hear your > >>> feedback on this idea. Does it align with the community’s goals? Are > >>> there any specific areas I should focus on or challenges I should be > >>> aware of? > >>> > >>> Thank you for your time and guidance. I look forward to contributing > >>> to OpenNLP and learning from this amazing > >>> > >>> Best regards, > >>> Jobin Sabu > >>> > >>> 85jobins...@gmail.com > >> > >> > >