I think that there is no real reason to keep that around, so I say option B.
On Sat, Feb 14, 2026 at 2:53 AM Lewis John McGibbney <[email protected]> wrote: > I didn't finish my thoughts for B(ii) > > B(ii) > - provide an OpenAPI specification (see my comments at > https://github.com/apache/nutch/pull/896) and address the errors and > warnings > - immediately deprecate the existing service and note it is for > non-production use. > - a further issue concerns whether we ship the openapi.yaml (and generated > server implementation) with the Nutch codebase or separate it out to a new > repository. If we were to keep it with the existing repository, then I > would opt for some build flag which would allow the creation of artifacts > with or without Nutch service included. A compile-time flag of sorts. > > Zooming out, if the OpenAPI exists then theoretically ANYONE can come > along and generate their own server and client implementation in any > language they want. The implementation just needs to know about NUTCH_HOME, > etc. and be able to interface with Nutch classes. > > One further thought I had is that the Nutch service should theoretically > be able to interface with a Hadoop cluster in order to fetch (and cache) > state rather than persist state in the running service process. I believe > this is another shortcoming of the current implementation. Finally (for > now) no authentication layer exists with the target Hadoop cluster... > > On 2026/02/14 01:29:31 Lewis John McGibbney wrote: > > Hi Isabelle, > > > > On 2026/02/13 16:53:29 Isabelle Giguere wrote: > > > > > I'm aware of one integration that uses the REST API. I'm not working > on > > > that application anymore, so I have no idea if the web crawler remains > an > > > important feature today, or if it would remain part of the product if > the > > > REST API disappears. > > > > OK > > > > > I would add one more concern to the list: > > > NutchServer method start() hard-codes the protocol "http", and there > is no > > > way to configure NutchServer to start on https, even if the protocol > was > > > not hard-coded, and the private constructor makes it impossible to > extend > > > NutchServer to fix the issue. > > > > Excellent observation. Yet one more nail in the coffin for the current > implementation as-is. > > > > > IMHO, deprecating or removing the REST API and providing only a CLI, > > > however, could make Nutch less likely to be used in professional > > > integrations. And therefore, maybe less likely to attract > contributions. > > > > I agree to some degree. In reality the Nutch service is not production > grade right now. I haven't heard about any production usage in a while > (apart from your above reference) and I think the current implementation > maybe does more damage than good when shipped with the current Nutch > releases. It also adds (outdated) and unnecessary dependency bloat to the > Nutch .job artifacts we submit to the Hadoop cluster. > > > > > That being said, I would not fight against option B. > > > > You got me thinking that there are maybe two sub-options for B. > > > > B(i) as per my original narrative > > > > B(ii) provide an OpenAPI specification (see my comments at > https://github.com/apache/nutch/pull/896) and deprecate the existing > service for removal in the next version of Nutch. If the OpenAPI exists > then theoretically ANYONE can come along and generate their own server and > client implementation in any language they want. The implementation just > needs to know about NUTCH_HOME, etc. and be able to interface with Nutch > classes. > > > > If we can. get enough peer review for the above PR then I would be in > favor of option B(ii). > > > > Sorry for making this more complicated. > > >

