Dear Jobin,

Thank you for your updated proposal and for taking the time to refine
it based on our discussions. Your plan aligns well with the project's
goals (IMHO) and I appreciate your focus on usability, performance, and
community needs. Here are some of my personal thoughts on your key
points:

(1) Python Client

Since the Python client is auto-generated from the gRPC proto
definition, we should avoid modifying the generated code directly.
Instead, we can design a well-structured interface around it, keeping
the generated code hidden from end users. This will ensure easier
updates in the future without breaking compatibility.

Packaging it for distribution via pip is a great idea. You may want to
check the ASF guidelines for PyPI releases to ensure
compliance: https://incubator.apache.org/guides/distribution.html#pypi.

When we reach that stage, the OpenNLP PMC will need to coordinate with
INFRA, but that shouldn't be an issue.

Detailed documentation and usage examples would be extremely valuable
for adoption too

(2) Expanding OpenNLP’s gRPC Features

Expanding support beyond what is currently available is a welcome
addition. NER and Chunking would be excellent candidates.

Performance testing is currently being explored as part of an ongoing
thesis, with results expected around June/July.

(3) Addressing Community Needs

A comparison between OpenNLP and existing Python NLP libraries like
nltk and spaCy would help highlight OpenNLP’s advantages and potential
use cases. Understanding where OpenNLP fits within the ecosystem will
be important for adoption.

Regarding your final questions:

- Prioritizing NER and Chunking seems like a logical next step.
- The upcoming performance research might provide insights on
optimizing gRPC communication.
- No strong opinion on a native Python wrapper from my side. However,
given that most active contributors are more familiar with Java, a
Java-based backend implementation remains the more maintainable choice,
IMHO.

Gruß
Richard








On 2025/03/22 13:54:40 Jobin Sabu wrote:
> Dear OpenNLP Developers,
> 
> I appreciate the valuable feedback I received earlier regarding my
GSoC
> 2025 proposal on developing a Python wrapper for Apache OpenNLP using
gRPC.
> Based on our discussions, I have refined my proposal to align better
with
> the project’s goals and community needs.
> 
> Here’s an overview of the updated plan:
> 
> 1. Enhancing the Python Client
> 
> Improve the existing gRPC-based Python client.
> 
> Package it as a user-friendly library that can be distributed via
pip.
> 
> Provide detailed documentation and usage examples.
> 
> 
> 
> 2. Expanding OpenNLP’s gRPC Features
> 
> Extend support beyond Sentence Detection, Tokenization, and POS
Tagging.
> 
> Investigate adding functionalities like Named Entity Recognition
(NER) and
> Chunking.
> 
> Optimize performance and ensure efficient communication between Java
and
> Python.
> 
> 
> 
> 3. Addressing Community Needs
> 
> Improve accessibility and usability for Python developers.
> 
> Gather feedback from potential users to ensure the wrapper is
practical and
> effective.
> 
> Work closely with OpenNLP maintainers to ensure long-term
maintainability.
> 
> 
> 
> 
> I would love to hear any further suggestions from the community.
> Specifically:
> 
> Are there additional functionalities that should be prioritized?
> 
> Any insights from the ongoing research on gRPC performance that I
should
> consider?
> 
> Would a native Python wrapper still be of interest, or is gRPC the
> preferred approach?
> 
> 
> Your feedback will help me finalize the proposal before submission.
Thanks
> again for your support!
>  Best regards
> Jobin Sabu
> 
> On Wed, 12 Mar, 2025, 8:41 pm Jeff Zemerick, <jz...@apache.org>
wrote:
> 
> > Hi Jobin,
> >
> > I would love to see a Python interface for OpenNLP, whether it is
via gRPC
> > or a native wrapper. I don't think I have any strong feelings
toward one
> > more than the other. Perhaps others can weigh in.
> >
> > OpenNLP saw a significant decrease in its user and developer
communities
> > when most of NLP moved to Python a few years back. However, it
remains a
> > very capable library and I think having easy access to it from
Python would
> > benefit the NLP community.
> >
> > Regardless of which approach is chosen, I think this would be a
great
> > submission for Apache's Community over Code NA conference in
September,
> > assuming the conference would fit your schedule and travel
requirements.
> > The CFP is open until April 21. I think the other Apache Community
over
> > Code conferences have their agendas already set for this year.
> >
> > https://communityovercode.org/call-for-presentations/
> >
> > Thanks,
> > Jeff
> >
> >
> > On Wed, Mar 12, 2025 at 8:53 AM Richard Zowalla <rz...@apache.org>
wrote:
> >
> > > Hi,
> > >
> > > Yes. You summarized it correctly.
> > >
> > > The following services are currently implemented:
> > >
> > > - Sentence Detection
> > > - Tokenization
> > > - POS Tagging
> > >
> > > The rest  of your proposal sounds valid to me.
> > >
> > > Currently, we have some ongoing research regarding the
performance of the
> > > gRPC implementation at our university by a student.
> > > That might give additional insights in the next weeks / months.
> > >
> > > Gruß
> > > Richard
> > >
> > > > Am 10.03.2025 um 14:59 schrieb Jobin Sabu <85...@gmail.com>:
> > > >
> > > > Dear Richard and Apache OpenNLP Developers
> > > >
> > > > Thank you, Richard, for your valuable feedback and for pointing
me to
> > the
> > > > gRPC work in the sandbox. I’ve taken a closer look at the
repository
> > and
> > > > gained a better understanding of the current implementation.
The
> > concept
> > > of
> > > > using gRPC to enable backend interactions with OpenNLP is
fascinating,
> > > and
> > > > I can see how this approach can benefit developers across
multiple
> > > > languages.
> > > >
> > > > Based on my understanding, the sandbox already includes:
> > > > 1. A gRPC schema for OpenNLP services with generated Java
stubs.
> > > > 2. A server implementation supporting tasks like POS tagging.
> > > > 3. An example Python client for interacting with the server.
> > > >
> > > > I find the idea of building on this foundation exciting. For my
GSoC
> > 2025
> > > > project, I’d like to propose focusing on **extending the gRPC
> > approach**,
> > > > specifically by:
> > > > - Improving the Python client and packaging it into a library
for
> > > > distribution via `pip`, making it easier for Python developers
to
> > > integrate
> > > > OpenNLP into their workflows.
> > > > - Exploring additional OpenNLP features (e.g., Named Entity
Recognition
> > > or
> > > > Sentence Detection) that can be added to the gRPC service.
> > > > - Enhancing documentation and providing real-world examples for
> > > > Python-based integrations.
> > > >
> > > > Alternatively, if the community sees more value in pursuing a
native
> > > Python
> > > > wrapper, I’m open to exploring that as well. My primary goal is
to
> > align
> > > my
> > > > efforts with OpenNLP’s priorities and deliver something
valuable for
> > the
> > > > community.
> > > >
> > > > I’d love to hear your thoughts and suggestions on this
approach. If
> > there
> > > > are specific areas the community would like me to focus on,
please let
> > me
> > > > know so I can refine my proposal accordingly.
> > > >
> > > > Thank you again for your guidance and support. I’m eager to
hear your
> > > > feedback and take the next steps toward preparing my GSoC
application.
> > > >
> > > > **Best regards,**
> > > > Jobin Sabu
> > > > Email: 85jobins...@gmail.com
> > > >
> > > > On Mon, 10 Mar, 2025, 1:59 pm Richard Zowalla,
<rz...@apache.org>
> > wrote:
> > > >
> > > >> Hi Jobin,
> > > >>
> > > >> Thanks for your interest in contributing to OpenNLP!
> > > >>
> > > >> You’re absolutely right—most existing Python wrappers are
either
> > > outdated
> > > >> or unmaintained, so this is a valuable idea in general. That
said,
> > there
> > > >> has been some work in the sandbox to demonstrate OpenNLP as a
gRPC
> > > service:
> > > >>
https://github.com/apache/opennlp-sandbox/tree/main/opennlp-grpc
> > > >>
> > > >> With this approach, a Python client can be generated (and
perhaps also
> > > put
> > > >> into pip) to communicate with an OpenNLP server. It might be
worth
> > > >> exploring whether extending or improving this setup aligns
with your
> > > goals.
> > > >>
> > > >> While a native Python wrapper is certainly an option, the gRPC
> > approach
> > > in
> > > >> the sandbox is another viable path. I’d love to hear thoughts
from
> > > others
> > > >> on this as well! WDYT?
> > > >> Gruß
> > > >> Richard
> > > >>
> > > >>
> > > >>> Am 08.03.2025 um 08:53 schrieb Jobin Sabu <85...@gmail.com>:
> > > >>>
> > > >>> Dear OpenNLP Community,
> > > >>>
> > > >>> My name is Jobin Sabu, and I’m a student with a background in
Python,
> > > >>> machine learning, and NLP. I’m excited about the opportunity
to
> > > >>> participate in Google Summer of Code (GSoC) 2025 with Apache
OpenNLP
> > > >>> and contribute to its development.
> > > >>>
> > > >>> I’d like to propose a project idea: developing a Python
wrapper for
> > > >>> Apache OpenNLP. The goal is to make OpenNLP’s powerful Java-
based NLP
> > > >>> features (e.g., tokenization, sentence detection, named
entity
> > > >>> recognition) accessible to Python developers. This wrapper
would
> > > >>> bridge Python and Java using libraries like JPype or Py4J,
providing
> > a
> > > >>> user-friendly interface and a pip-installable package.
> > > >>>
> > > >>> Here’s an outline of the project:
> > > >>> 1. Implement Python functions that map to OpenNLP’s core
features.
> > > >>> 2. Ensure seamless interoperability between Python and Java.
> > > >>> 3. Develop detailed documentation, tutorials, and example
scripts.
> > > >>> 4. Write unit tests for robustness and performance
benchmarks.
> > > >>>
> > > >>> I believe this project will expand OpenNLP’s usability and
attract
> > > >>> more developers from the Python community. I’d love to hear
your
> > > >>> feedback on this idea. Does it align with the community’s
goals? Are
> > > >>> there any specific areas I should focus on or challenges I
should be
> > > >>> aware of?
> > > >>>
> > > >>> Thank you for your time and guidance. I look forward to
contributing
> > > >>> to OpenNLP and learning from this amazing
> > > >>>
> > > >>> Best regards,
> > > >>> Jobin Sabu
> > > >>>
> > > >>> 85jobins...@gmail.com
> > > >>
> > > >>
> > >
> > >
> >
> 

Reply via email to