Dear Jobin, Thank you for your updated proposal and for taking the time to refine it based on our discussions. Your plan aligns well with the project's goals (IMHO) and I appreciate your focus on usability, performance, and community needs. Here are some of my personal thoughts on your key points:
(1) Python Client Since the Python client is auto-generated from the gRPC proto definition, we should avoid modifying the generated code directly. Instead, we can design a well-structured interface around it, keeping the generated code hidden from end users. This will ensure easier updates in the future without breaking compatibility. Packaging it for distribution via pip is a great idea. You may want to check the ASF guidelines for PyPI releases to ensure compliance: https://incubator.apache.org/guides/distribution.html#pypi. When we reach that stage, the OpenNLP PMC will need to coordinate with INFRA, but that shouldn't be an issue. Detailed documentation and usage examples would be extremely valuable for adoption too (2) Expanding OpenNLP’s gRPC Features Expanding support beyond what is currently available is a welcome addition. NER and Chunking would be excellent candidates. Performance testing is currently being explored as part of an ongoing thesis, with results expected around June/July. (3) Addressing Community Needs A comparison between OpenNLP and existing Python NLP libraries like nltk and spaCy would help highlight OpenNLP’s advantages and potential use cases. Understanding where OpenNLP fits within the ecosystem will be important for adoption. Regarding your final questions: - Prioritizing NER and Chunking seems like a logical next step. - The upcoming performance research might provide insights on optimizing gRPC communication. - No strong opinion on a native Python wrapper from my side. However, given that most active contributors are more familiar with Java, a Java-based backend implementation remains the more maintainable choice, IMHO. Gruß Richard On 2025/03/22 13:54:40 Jobin Sabu wrote: > Dear OpenNLP Developers, > > I appreciate the valuable feedback I received earlier regarding my GSoC > 2025 proposal on developing a Python wrapper for Apache OpenNLP using gRPC. > Based on our discussions, I have refined my proposal to align better with > the project’s goals and community needs. > > Here’s an overview of the updated plan: > > 1. Enhancing the Python Client > > Improve the existing gRPC-based Python client. > > Package it as a user-friendly library that can be distributed via pip. > > Provide detailed documentation and usage examples. > > > > 2. Expanding OpenNLP’s gRPC Features > > Extend support beyond Sentence Detection, Tokenization, and POS Tagging. > > Investigate adding functionalities like Named Entity Recognition (NER) and > Chunking. > > Optimize performance and ensure efficient communication between Java and > Python. > > > > 3. Addressing Community Needs > > Improve accessibility and usability for Python developers. > > Gather feedback from potential users to ensure the wrapper is practical and > effective. > > Work closely with OpenNLP maintainers to ensure long-term maintainability. > > > > > I would love to hear any further suggestions from the community. > Specifically: > > Are there additional functionalities that should be prioritized? > > Any insights from the ongoing research on gRPC performance that I should > consider? > > Would a native Python wrapper still be of interest, or is gRPC the > preferred approach? > > > Your feedback will help me finalize the proposal before submission. Thanks > again for your support! > Best regards > Jobin Sabu > > On Wed, 12 Mar, 2025, 8:41 pm Jeff Zemerick, <jz...@apache.org> wrote: > > > Hi Jobin, > > > > I would love to see a Python interface for OpenNLP, whether it is via gRPC > > or a native wrapper. I don't think I have any strong feelings toward one > > more than the other. Perhaps others can weigh in. > > > > OpenNLP saw a significant decrease in its user and developer communities > > when most of NLP moved to Python a few years back. However, it remains a > > very capable library and I think having easy access to it from Python would > > benefit the NLP community. > > > > Regardless of which approach is chosen, I think this would be a great > > submission for Apache's Community over Code NA conference in September, > > assuming the conference would fit your schedule and travel requirements. > > The CFP is open until April 21. I think the other Apache Community over > > Code conferences have their agendas already set for this year. > > > > https://communityovercode.org/call-for-presentations/ > > > > Thanks, > > Jeff > > > > > > On Wed, Mar 12, 2025 at 8:53 AM Richard Zowalla <rz...@apache.org> wrote: > > > > > Hi, > > > > > > Yes. You summarized it correctly. > > > > > > The following services are currently implemented: > > > > > > - Sentence Detection > > > - Tokenization > > > - POS Tagging > > > > > > The rest of your proposal sounds valid to me. > > > > > > Currently, we have some ongoing research regarding the performance of the > > > gRPC implementation at our university by a student. > > > That might give additional insights in the next weeks / months. > > > > > > Gruß > > > Richard > > > > > > > Am 10.03.2025 um 14:59 schrieb Jobin Sabu <85...@gmail.com>: > > > > > > > > Dear Richard and Apache OpenNLP Developers > > > > > > > > Thank you, Richard, for your valuable feedback and for pointing me to > > the > > > > gRPC work in the sandbox. I’ve taken a closer look at the repository > > and > > > > gained a better understanding of the current implementation. The > > concept > > > of > > > > using gRPC to enable backend interactions with OpenNLP is fascinating, > > > and > > > > I can see how this approach can benefit developers across multiple > > > > languages. > > > > > > > > Based on my understanding, the sandbox already includes: > > > > 1. A gRPC schema for OpenNLP services with generated Java stubs. > > > > 2. A server implementation supporting tasks like POS tagging. > > > > 3. An example Python client for interacting with the server. > > > > > > > > I find the idea of building on this foundation exciting. For my GSoC > > 2025 > > > > project, I’d like to propose focusing on **extending the gRPC > > approach**, > > > > specifically by: > > > > - Improving the Python client and packaging it into a library for > > > > distribution via `pip`, making it easier for Python developers to > > > integrate > > > > OpenNLP into their workflows. > > > > - Exploring additional OpenNLP features (e.g., Named Entity Recognition > > > or > > > > Sentence Detection) that can be added to the gRPC service. > > > > - Enhancing documentation and providing real-world examples for > > > > Python-based integrations. > > > > > > > > Alternatively, if the community sees more value in pursuing a native > > > Python > > > > wrapper, I’m open to exploring that as well. My primary goal is to > > align > > > my > > > > efforts with OpenNLP’s priorities and deliver something valuable for > > the > > > > community. > > > > > > > > I’d love to hear your thoughts and suggestions on this approach. If > > there > > > > are specific areas the community would like me to focus on, please let > > me > > > > know so I can refine my proposal accordingly. > > > > > > > > Thank you again for your guidance and support. I’m eager to hear your > > > > feedback and take the next steps toward preparing my GSoC application. > > > > > > > > **Best regards,** > > > > Jobin Sabu > > > > Email: 85jobins...@gmail.com > > > > > > > > On Mon, 10 Mar, 2025, 1:59 pm Richard Zowalla, <rz...@apache.org> > > wrote: > > > > > > > >> Hi Jobin, > > > >> > > > >> Thanks for your interest in contributing to OpenNLP! > > > >> > > > >> You’re absolutely right—most existing Python wrappers are either > > > outdated > > > >> or unmaintained, so this is a valuable idea in general. That said, > > there > > > >> has been some work in the sandbox to demonstrate OpenNLP as a gRPC > > > service: > > > >> https://github.com/apache/opennlp-sandbox/tree/main/opennlp-grpc > > > >> > > > >> With this approach, a Python client can be generated (and perhaps also > > > put > > > >> into pip) to communicate with an OpenNLP server. It might be worth > > > >> exploring whether extending or improving this setup aligns with your > > > goals. > > > >> > > > >> While a native Python wrapper is certainly an option, the gRPC > > approach > > > in > > > >> the sandbox is another viable path. I’d love to hear thoughts from > > > others > > > >> on this as well! WDYT? > > > >> Gruß > > > >> Richard > > > >> > > > >> > > > >>> Am 08.03.2025 um 08:53 schrieb Jobin Sabu <85...@gmail.com>: > > > >>> > > > >>> Dear OpenNLP Community, > > > >>> > > > >>> My name is Jobin Sabu, and I’m a student with a background in Python, > > > >>> machine learning, and NLP. I’m excited about the opportunity to > > > >>> participate in Google Summer of Code (GSoC) 2025 with Apache OpenNLP > > > >>> and contribute to its development. > > > >>> > > > >>> I’d like to propose a project idea: developing a Python wrapper for > > > >>> Apache OpenNLP. The goal is to make OpenNLP’s powerful Java- based NLP > > > >>> features (e.g., tokenization, sentence detection, named entity > > > >>> recognition) accessible to Python developers. This wrapper would > > > >>> bridge Python and Java using libraries like JPype or Py4J, providing > > a > > > >>> user-friendly interface and a pip-installable package. > > > >>> > > > >>> Here’s an outline of the project: > > > >>> 1. Implement Python functions that map to OpenNLP’s core features. > > > >>> 2. Ensure seamless interoperability between Python and Java. > > > >>> 3. Develop detailed documentation, tutorials, and example scripts. > > > >>> 4. Write unit tests for robustness and performance benchmarks. > > > >>> > > > >>> I believe this project will expand OpenNLP’s usability and attract > > > >>> more developers from the Python community. I’d love to hear your > > > >>> feedback on this idea. Does it align with the community’s goals? Are > > > >>> there any specific areas I should focus on or challenges I should be > > > >>> aware of? > > > >>> > > > >>> Thank you for your time and guidance. I look forward to contributing > > > >>> to OpenNLP and learning from this amazing > > > >>> > > > >>> Best regards, > > > >>> Jobin Sabu > > > >>> > > > >>> 85jobins...@gmail.com > > > >> > > > >> > > > > > > > > >