Dear OpenNLP Developers, I appreciate the valuable feedback I received earlier regarding my GSoC 2025 proposal on developing a Python wrapper for Apache OpenNLP using gRPC. Based on our discussions, I have refined my proposal to align better with the project’s goals and community needs.
Here’s an overview of the updated plan: 1. Enhancing the Python Client Improve the existing gRPC-based Python client. Package it as a user-friendly library that can be distributed via pip. Provide detailed documentation and usage examples. 2. Expanding OpenNLP’s gRPC Features Extend support beyond Sentence Detection, Tokenization, and POS Tagging. Investigate adding functionalities like Named Entity Recognition (NER) and Chunking. Optimize performance and ensure efficient communication between Java and Python. 3. Addressing Community Needs Improve accessibility and usability for Python developers. Gather feedback from potential users to ensure the wrapper is practical and effective. Work closely with OpenNLP maintainers to ensure long-term maintainability. I would love to hear any further suggestions from the community. Specifically: Are there additional functionalities that should be prioritized? Any insights from the ongoing research on gRPC performance that I should consider? Would a native Python wrapper still be of interest, or is gRPC the preferred approach? Your feedback will help me finalize the proposal before submission. Thanks again for your support! Best regards Jobin Sabu On Wed, 12 Mar, 2025, 8:41 pm Jeff Zemerick, <jzemer...@apache.org> wrote: > Hi Jobin, > > I would love to see a Python interface for OpenNLP, whether it is via gRPC > or a native wrapper. I don't think I have any strong feelings toward one > more than the other. Perhaps others can weigh in. > > OpenNLP saw a significant decrease in its user and developer communities > when most of NLP moved to Python a few years back. However, it remains a > very capable library and I think having easy access to it from Python would > benefit the NLP community. > > Regardless of which approach is chosen, I think this would be a great > submission for Apache's Community over Code NA conference in September, > assuming the conference would fit your schedule and travel requirements. > The CFP is open until April 21. I think the other Apache Community over > Code conferences have their agendas already set for this year. > > https://communityovercode.org/call-for-presentations/ > > Thanks, > Jeff > > > On Wed, Mar 12, 2025 at 8:53 AM Richard Zowalla <r...@apache.org> wrote: > > > Hi, > > > > Yes. You summarized it correctly. > > > > The following services are currently implemented: > > > > - Sentence Detection > > - Tokenization > > - POS Tagging > > > > The rest of your proposal sounds valid to me. > > > > Currently, we have some ongoing research regarding the performance of the > > gRPC implementation at our university by a student. > > That might give additional insights in the next weeks / months. > > > > Gruß > > Richard > > > > > Am 10.03.2025 um 14:59 schrieb Jobin Sabu <85jobins...@gmail.com>: > > > > > > Dear Richard and Apache OpenNLP Developers > > > > > > Thank you, Richard, for your valuable feedback and for pointing me to > the > > > gRPC work in the sandbox. I’ve taken a closer look at the repository > and > > > gained a better understanding of the current implementation. The > concept > > of > > > using gRPC to enable backend interactions with OpenNLP is fascinating, > > and > > > I can see how this approach can benefit developers across multiple > > > languages. > > > > > > Based on my understanding, the sandbox already includes: > > > 1. A gRPC schema for OpenNLP services with generated Java stubs. > > > 2. A server implementation supporting tasks like POS tagging. > > > 3. An example Python client for interacting with the server. > > > > > > I find the idea of building on this foundation exciting. For my GSoC > 2025 > > > project, I’d like to propose focusing on **extending the gRPC > approach**, > > > specifically by: > > > - Improving the Python client and packaging it into a library for > > > distribution via `pip`, making it easier for Python developers to > > integrate > > > OpenNLP into their workflows. > > > - Exploring additional OpenNLP features (e.g., Named Entity Recognition > > or > > > Sentence Detection) that can be added to the gRPC service. > > > - Enhancing documentation and providing real-world examples for > > > Python-based integrations. > > > > > > Alternatively, if the community sees more value in pursuing a native > > Python > > > wrapper, I’m open to exploring that as well. My primary goal is to > align > > my > > > efforts with OpenNLP’s priorities and deliver something valuable for > the > > > community. > > > > > > I’d love to hear your thoughts and suggestions on this approach. If > there > > > are specific areas the community would like me to focus on, please let > me > > > know so I can refine my proposal accordingly. > > > > > > Thank you again for your guidance and support. I’m eager to hear your > > > feedback and take the next steps toward preparing my GSoC application. > > > > > > **Best regards,** > > > Jobin Sabu > > > Email: 85jobins...@gmail.com > > > > > > On Mon, 10 Mar, 2025, 1:59 pm Richard Zowalla, <r...@apache.org> > wrote: > > > > > >> Hi Jobin, > > >> > > >> Thanks for your interest in contributing to OpenNLP! > > >> > > >> You’re absolutely right—most existing Python wrappers are either > > outdated > > >> or unmaintained, so this is a valuable idea in general. That said, > there > > >> has been some work in the sandbox to demonstrate OpenNLP as a gRPC > > service: > > >> https://github.com/apache/opennlp-sandbox/tree/main/opennlp-grpc > > >> > > >> With this approach, a Python client can be generated (and perhaps also > > put > > >> into pip) to communicate with an OpenNLP server. It might be worth > > >> exploring whether extending or improving this setup aligns with your > > goals. > > >> > > >> While a native Python wrapper is certainly an option, the gRPC > approach > > in > > >> the sandbox is another viable path. I’d love to hear thoughts from > > others > > >> on this as well! WDYT? > > >> Gruß > > >> Richard > > >> > > >> > > >>> Am 08.03.2025 um 08:53 schrieb Jobin Sabu <85jobins...@gmail.com>: > > >>> > > >>> Dear OpenNLP Community, > > >>> > > >>> My name is Jobin Sabu, and I’m a student with a background in Python, > > >>> machine learning, and NLP. I’m excited about the opportunity to > > >>> participate in Google Summer of Code (GSoC) 2025 with Apache OpenNLP > > >>> and contribute to its development. > > >>> > > >>> I’d like to propose a project idea: developing a Python wrapper for > > >>> Apache OpenNLP. The goal is to make OpenNLP’s powerful Java-based NLP > > >>> features (e.g., tokenization, sentence detection, named entity > > >>> recognition) accessible to Python developers. This wrapper would > > >>> bridge Python and Java using libraries like JPype or Py4J, providing > a > > >>> user-friendly interface and a pip-installable package. > > >>> > > >>> Here’s an outline of the project: > > >>> 1. Implement Python functions that map to OpenNLP’s core features. > > >>> 2. Ensure seamless interoperability between Python and Java. > > >>> 3. Develop detailed documentation, tutorials, and example scripts. > > >>> 4. Write unit tests for robustness and performance benchmarks. > > >>> > > >>> I believe this project will expand OpenNLP’s usability and attract > > >>> more developers from the Python community. I’d love to hear your > > >>> feedback on this idea. Does it align with the community’s goals? Are > > >>> there any specific areas I should focus on or challenges I should be > > >>> aware of? > > >>> > > >>> Thank you for your time and guidance. I look forward to contributing > > >>> to OpenNLP and learning from this amazing > > >>> > > >>> Best regards, > > >>> Jobin Sabu > > >>> > > >>> 85jobins...@gmail.com > > >> > > >> > > > > >