Hi;

I'm aware of one integration that uses the REST API.  I'm not working on
that application anymore, so I have no idea if the web crawler remains an
important feature today, or if it would remain part of the product if the
REST API disappears.

I would add one more concern to the list:
NutchServer method start() hard-codes the protocol "http", and there is no
way to configure NutchServer to start on https, even if the protocol was
not hard-coded, and the private constructor makes it impossible to extend
NutchServer to fix the issue.

IMHO, deprecating or removing the REST API and providing only a CLI,
however, could make Nutch less likely to be used in professional
integrations.  And therefore, maybe less likely to attract contributions.

That being said, I would not fight against option B.

Isabelle Giguère


Le ven. 13 févr. 2026 à 10:10, Sebastian Nagel <[email protected]>
a écrit :

> Hi Lewis,
>
> thanks for starting the discussion!
>
> I'd also opt for option B, but I'm biased because not using the REST API.
>
> ~Sebastian
>
>
> On 2/13/26 02:18, Lewis John McGibbney wrote:
> > Some more context regarding limitations of the current approach. Nutch
> server is limited to working with Nutch in local mode only. Although a Web
> Application does exist [0], it was extracted from the codebase some 5 or so
> years ago. It hasn't seen any development since then. As part of this
> discussion I also want to share my opinion to retire and archive the Nutch
> webapp repository.
> > Thanks
> > lewismc
> >
> > [0] https://github.com/apache/nutch-webapp
> >
> > On 2026/02/13 01:13:19 lewis john mcgibbney wrote:
> >> Hi dev@,
> >>
> >> For a while now I've been thinking that the Nutch REST API (NutchServer,
> >> JAX-RS/Apache CXF) [0] has become somewhat of a burden. It hasn't seen
> much
> >> activity for quite a while and the underlying dependencies are dated.
> I'd
> >> personally like to get the community's input on whether the REST API is
> >> still a feature we wish to continue maintaining (albeit passively). Let
> me
> >> provide context below.
> >>
> >> Current state of the REST API
> >> =======================
> >> The REST API consists of ~35 Java source files under o.a.n.service.*,
> >> exposing endpoints for admin operations, job management, configuration,
> >> seed management, and database reading.
> >>  From what I can tell the following issues exist:
> >>
> >>     1. No authentication or authorization whatsoever. Every endpoint is
> >>     completely open, including:
> >>        1. GET /admin/stop -- allows unauthenticated remote server
> shutdown
> >>        2. POST /job/create -- allows unrestricted job creation
> >>        3. PUT /config/{configId}/{propertyId} -- allows unrestricted
> >>        configuration modification
> >>        4. No input validation (no Bean Validation annotations) no CORS
> policy
> >>     2. No health or metrics endpoints: no /health or /healthz for
> >>     liveness/readiness probes (relevant for Docker/K8s deployments),
> and no
> >>     /metrics endpoint for Prometheus or similar. The Docker setup
> >>     (docker/Dockerfile) also has no health checks defined.
> >>     3. Near-zero test coverage. TestNutchServer.java contains a single
> test
> >>     that attempts to start the server and hit the /admin endpoint, but
> both
> >>     assertions are commented out (//Assert.assertTrue(...)). There are
> no
> >>     endpoint-level tests for any of the other resources (job, config,
> seed, db,
> >>     reader).
> >>     4. Code quality issues:
> >>        1. Class name typo: ReaderResouce (missing 'r')
> >>        2. SequenceReader.java has 6 auto-generated TODO stubs
> (unimplemented
> >>        methods)
> >>        3. No OpenAPI/Swagger documentation (something I've wanted to do
> for
> >>        a long time)
> >>
> >> The question for the community
> >> ========================
> >> Addressing these issues properly would be a substantial amount of work.
> At
> >> the same time, it's unclear how widely the REST API is actually being
> used
> >> (in production) by Nutch users.
> >>
> >> Therefore I'd like to propose a few items for discussion:
> >>
> >> (A) Invest in hardening the REST API. Add auth (at minimum API key or
> basic
> >> auth), input validation, health/metrics endpoints, OpenAPI docs, and
> proper
> >> test coverage. This is the most work but preserves the API for users who
> >> depend on it. QUite honestly, even if we selected this option, I would
> >> propose we start over with a fresh OpenAPI specification and built it
> out
> >> from there.
> >> (B) Deprecate the REST API. Mark it as deprecated in the current release
> >> with a notice that it will be removed in a future major version, giving
> >> users time to migrate to CLI-based workflows or their own orchestration
> >> layer.
> >> (C) Remove the REST API. Remove the o.a.n.service package entirely from
> the
> >> codebase. This eliminates the security surface and ongoing maintenance
> >> burden.
> >>
> >> If anyone on the list is actively using the REST API (or knows of
> >> deployments that do), it would be very helpful to hear about your use
> case
> >> and whether the current API meets your needs.
> >>
> >> Thanks,
> >> lewismc
> >>
> >> P.S. My current learning is towards option B but I am really keen to
> read
> >> other opinions.
> >>
> >> [0]
> >>
> https://github.com/apache/nutch/tree/master/src/java/org/apache/nutch/service
> >> --
> >> http://people.apache.org/keys/committer/lewismc
> >>
>
>

Reply via email to