Some more context regarding limitations of the current approach. Nutch server is limited to working with Nutch in local mode only. Although a Web Application does exist [0], it was extracted from the codebase some 5 or so years ago. It hasn't seen any development since then. As part of this discussion I also want to share my opinion to retire and archive the Nutch webapp repository. Thanks lewismc
[0] https://github.com/apache/nutch-webapp On 2026/02/13 01:13:19 lewis john mcgibbney wrote: > Hi dev@, > > For a while now I've been thinking that the Nutch REST API (NutchServer, > JAX-RS/Apache CXF) [0] has become somewhat of a burden. It hasn't seen much > activity for quite a while and the underlying dependencies are dated. I'd > personally like to get the community's input on whether the REST API is > still a feature we wish to continue maintaining (albeit passively). Let me > provide context below. > > Current state of the REST API > ======================= > The REST API consists of ~35 Java source files under o.a.n.service.*, > exposing endpoints for admin operations, job management, configuration, > seed management, and database reading. > From what I can tell the following issues exist: > > 1. No authentication or authorization whatsoever. Every endpoint is > completely open, including: > 1. GET /admin/stop -- allows unauthenticated remote server shutdown > 2. POST /job/create -- allows unrestricted job creation > 3. PUT /config/{configId}/{propertyId} -- allows unrestricted > configuration modification > 4. No input validation (no Bean Validation annotations) no CORS policy > 2. No health or metrics endpoints: no /health or /healthz for > liveness/readiness probes (relevant for Docker/K8s deployments), and no > /metrics endpoint for Prometheus or similar. The Docker setup > (docker/Dockerfile) also has no health checks defined. > 3. Near-zero test coverage. TestNutchServer.java contains a single test > that attempts to start the server and hit the /admin endpoint, but both > assertions are commented out (//Assert.assertTrue(...)). There are no > endpoint-level tests for any of the other resources (job, config, seed, db, > reader). > 4. Code quality issues: > 1. Class name typo: ReaderResouce (missing 'r') > 2. SequenceReader.java has 6 auto-generated TODO stubs (unimplemented > methods) > 3. No OpenAPI/Swagger documentation (something I've wanted to do for > a long time) > > The question for the community > ======================== > Addressing these issues properly would be a substantial amount of work. At > the same time, it's unclear how widely the REST API is actually being used > (in production) by Nutch users. > > Therefore I'd like to propose a few items for discussion: > > (A) Invest in hardening the REST API. Add auth (at minimum API key or basic > auth), input validation, health/metrics endpoints, OpenAPI docs, and proper > test coverage. This is the most work but preserves the API for users who > depend on it. QUite honestly, even if we selected this option, I would > propose we start over with a fresh OpenAPI specification and built it out > from there. > (B) Deprecate the REST API. Mark it as deprecated in the current release > with a notice that it will be removed in a future major version, giving > users time to migrate to CLI-based workflows or their own orchestration > layer. > (C) Remove the REST API. Remove the o.a.n.service package entirely from the > codebase. This eliminates the security surface and ongoing maintenance > burden. > > If anyone on the list is actively using the REST API (or knows of > deployments that do), it would be very helpful to hear about your use case > and whether the current API meets your needs. > > Thanks, > lewismc > > P.S. My current learning is towards option B but I am really keen to read > other opinions. > > [0] > https://github.com/apache/nutch/tree/master/src/java/org/apache/nutch/service > -- > http://people.apache.org/keys/committer/lewismc >

