Hi; I'm aware of one integration that uses the REST API. I'm not working on that application anymore, so I have no idea if the web crawler remains an important feature today, or if it would remain part of the product if the REST API disappears.
I would add one more concern to the list: NutchServer method start() hard-codes the protocol "http", and there is no way to configure NutchServer to start on https, even if the protocol was not hard-coded, and the private constructor makes it impossible to extend NutchServer to fix the issue. IMHO, deprecating or removing the REST API and providing only a CLI, however, could make Nutch less likely to be used in professional integrations. And therefore, maybe less likely to attract contributions. That being said, I would not fight against option B. Isabelle Giguère Le ven. 13 févr. 2026 à 10:10, Sebastian Nagel <[email protected]> a écrit : > Hi Lewis, > > thanks for starting the discussion! > > I'd also opt for option B, but I'm biased because not using the REST API. > > ~Sebastian > > > On 2/13/26 02:18, Lewis John McGibbney wrote: > > Some more context regarding limitations of the current approach. Nutch > server is limited to working with Nutch in local mode only. Although a Web > Application does exist [0], it was extracted from the codebase some 5 or so > years ago. It hasn't seen any development since then. As part of this > discussion I also want to share my opinion to retire and archive the Nutch > webapp repository. > > Thanks > > lewismc > > > > [0] https://github.com/apache/nutch-webapp > > > > On 2026/02/13 01:13:19 lewis john mcgibbney wrote: > >> Hi dev@, > >> > >> For a while now I've been thinking that the Nutch REST API (NutchServer, > >> JAX-RS/Apache CXF) [0] has become somewhat of a burden. It hasn't seen > much > >> activity for quite a while and the underlying dependencies are dated. > I'd > >> personally like to get the community's input on whether the REST API is > >> still a feature we wish to continue maintaining (albeit passively). Let > me > >> provide context below. > >> > >> Current state of the REST API > >> ======================= > >> The REST API consists of ~35 Java source files under o.a.n.service.*, > >> exposing endpoints for admin operations, job management, configuration, > >> seed management, and database reading. > >> From what I can tell the following issues exist: > >> > >> 1. No authentication or authorization whatsoever. Every endpoint is > >> completely open, including: > >> 1. GET /admin/stop -- allows unauthenticated remote server > shutdown > >> 2. POST /job/create -- allows unrestricted job creation > >> 3. PUT /config/{configId}/{propertyId} -- allows unrestricted > >> configuration modification > >> 4. No input validation (no Bean Validation annotations) no CORS > policy > >> 2. No health or metrics endpoints: no /health or /healthz for > >> liveness/readiness probes (relevant for Docker/K8s deployments), > and no > >> /metrics endpoint for Prometheus or similar. The Docker setup > >> (docker/Dockerfile) also has no health checks defined. > >> 3. Near-zero test coverage. TestNutchServer.java contains a single > test > >> that attempts to start the server and hit the /admin endpoint, but > both > >> assertions are commented out (//Assert.assertTrue(...)). There are > no > >> endpoint-level tests for any of the other resources (job, config, > seed, db, > >> reader). > >> 4. Code quality issues: > >> 1. Class name typo: ReaderResouce (missing 'r') > >> 2. SequenceReader.java has 6 auto-generated TODO stubs > (unimplemented > >> methods) > >> 3. No OpenAPI/Swagger documentation (something I've wanted to do > for > >> a long time) > >> > >> The question for the community > >> ======================== > >> Addressing these issues properly would be a substantial amount of work. > At > >> the same time, it's unclear how widely the REST API is actually being > used > >> (in production) by Nutch users. > >> > >> Therefore I'd like to propose a few items for discussion: > >> > >> (A) Invest in hardening the REST API. Add auth (at minimum API key or > basic > >> auth), input validation, health/metrics endpoints, OpenAPI docs, and > proper > >> test coverage. This is the most work but preserves the API for users who > >> depend on it. QUite honestly, even if we selected this option, I would > >> propose we start over with a fresh OpenAPI specification and built it > out > >> from there. > >> (B) Deprecate the REST API. Mark it as deprecated in the current release > >> with a notice that it will be removed in a future major version, giving > >> users time to migrate to CLI-based workflows or their own orchestration > >> layer. > >> (C) Remove the REST API. Remove the o.a.n.service package entirely from > the > >> codebase. This eliminates the security surface and ongoing maintenance > >> burden. > >> > >> If anyone on the list is actively using the REST API (or knows of > >> deployments that do), it would be very helpful to hear about your use > case > >> and whether the current API meets your needs. > >> > >> Thanks, > >> lewismc > >> > >> P.S. My current learning is towards option B but I am really keen to > read > >> other opinions. > >> > >> [0] > >> > https://github.com/apache/nutch/tree/master/src/java/org/apache/nutch/service > >> -- > >> http://people.apache.org/keys/committer/lewismc > >> > >

