Thanks for the feedback. I think option B(ii) sounds good.
I'll have a look at the PR. Isabelle Giguère Le sam. 14 févr. 2026 à 19:32, Joe Gilvary <[email protected]> a écrit : > I've never used it, so I'm thinking the B(ii) sounds best, unless we hear > more from current users. I'd prefer the compile-time flag option, and > unless we know at some point that no one uses it, some hand-holding > tutorial content for it. > > I've worked local all along so I defer to those with experience on the > Hadoop concerns. > > Thanks, stay safe, stay healthy, > > Joe > On 2/13/26 21:08, BlackIce wrote: > > I think that there is no real reason to keep that around, so I say option > B. > > On Sat, Feb 14, 2026 at 2:53 AM Lewis John McGibbney <[email protected]> > wrote: > >> I didn't finish my thoughts for B(ii) >> >> B(ii) >> - provide an OpenAPI specification (see my comments at >> https://github.com/apache/nutch/pull/896) and address the errors and >> warnings >> - immediately deprecate the existing service and note it is for >> non-production use. >> - a further issue concerns whether we ship the openapi.yaml (and >> generated server implementation) with the Nutch codebase or separate it out >> to a new repository. If we were to keep it with the existing repository, >> then I would opt for some build flag which would allow the creation of >> artifacts with or without Nutch service included. A compile-time flag of >> sorts. >> >> Zooming out, if the OpenAPI exists then theoretically ANYONE can come >> along and generate their own server and client implementation in any >> language they want. The implementation just needs to know about NUTCH_HOME, >> etc. and be able to interface with Nutch classes. >> >> One further thought I had is that the Nutch service should theoretically >> be able to interface with a Hadoop cluster in order to fetch (and cache) >> state rather than persist state in the running service process. I believe >> this is another shortcoming of the current implementation. Finally (for >> now) no authentication layer exists with the target Hadoop cluster... >> >> On 2026/02/14 01:29:31 Lewis John McGibbney wrote: >> > Hi Isabelle, >> > >> > On 2026/02/13 16:53:29 Isabelle Giguere wrote: >> > >> > > I'm aware of one integration that uses the REST API. I'm not working >> on >> > > that application anymore, so I have no idea if the web crawler >> remains an >> > > important feature today, or if it would remain part of the product if >> the >> > > REST API disappears. >> > >> > OK >> > >> > > I would add one more concern to the list: >> > > NutchServer method start() hard-codes the protocol "http", and there >> is no >> > > way to configure NutchServer to start on https, even if the protocol >> was >> > > not hard-coded, and the private constructor makes it impossible to >> extend >> > > NutchServer to fix the issue. >> > >> > Excellent observation. Yet one more nail in the coffin for the current >> implementation as-is. >> > >> > > IMHO, deprecating or removing the REST API and providing only a CLI, >> > > however, could make Nutch less likely to be used in professional >> > > integrations. And therefore, maybe less likely to attract >> contributions. >> > >> > I agree to some degree. In reality the Nutch service is not production >> grade right now. I haven't heard about any production usage in a while >> (apart from your above reference) and I think the current implementation >> maybe does more damage than good when shipped with the current Nutch >> releases. It also adds (outdated) and unnecessary dependency bloat to the >> Nutch .job artifacts we submit to the Hadoop cluster. >> > >> > > That being said, I would not fight against option B. >> > >> > You got me thinking that there are maybe two sub-options for B. >> > >> > B(i) as per my original narrative >> > >> > B(ii) provide an OpenAPI specification (see my comments at >> https://github.com/apache/nutch/pull/896) and deprecate the existing >> service for removal in the next version of Nutch. If the OpenAPI exists >> then theoretically ANYONE can come along and generate their own server and >> client implementation in any language they want. The implementation just >> needs to know about NUTCH_HOME, etc. and be able to interface with Nutch >> classes. >> > >> > If we can. get enough peer review for the above PR then I would be in >> favor of option B(ii). >> > >> > Sorry for making this more complicated. >> > >> >

