I think that there is no real reason to keep that around, so I say option B.

On Sat, Feb 14, 2026 at 2:53 AM Lewis John McGibbney <[email protected]>
wrote:

> I didn't finish my thoughts for B(ii)
>
> B(ii)
> - provide an OpenAPI specification (see my comments at
> https://github.com/apache/nutch/pull/896) and address the errors and
> warnings
> - immediately deprecate the existing service and note it is for
> non-production use.
> - a further issue concerns whether we ship the openapi.yaml (and generated
> server implementation) with the Nutch codebase or separate it out to a new
> repository. If we were to keep it with the existing repository, then I
> would opt for some build flag which would allow the creation of artifacts
> with or without Nutch service included. A compile-time flag of sorts.
>
> Zooming out, if the OpenAPI exists then theoretically ANYONE can come
> along and generate their own server and client implementation in any
> language they want. The implementation just needs to know about NUTCH_HOME,
> etc. and be able to interface with Nutch classes.
>
> One further thought I had is that the Nutch service should theoretically
> be able to interface with a Hadoop cluster in order to fetch (and cache)
> state rather than persist state in the running service process. I believe
> this is another shortcoming of the current implementation. Finally (for
> now) no authentication layer exists with the target Hadoop cluster...
>
> On 2026/02/14 01:29:31 Lewis John McGibbney wrote:
> > Hi Isabelle,
> >
> > On 2026/02/13 16:53:29 Isabelle Giguere wrote:
> >
> > > I'm aware of one integration that uses the REST API.  I'm not working
> on
> > > that application anymore, so I have no idea if the web crawler remains
> an
> > > important feature today, or if it would remain part of the product if
> the
> > > REST API disappears.
> >
> > OK
> >
> > > I would add one more concern to the list:
> > > NutchServer method start() hard-codes the protocol "http", and there
> is no
> > > way to configure NutchServer to start on https, even if the protocol
> was
> > > not hard-coded, and the private constructor makes it impossible to
> extend
> > > NutchServer to fix the issue.
> >
> > Excellent observation. Yet one more nail in the coffin for the current
> implementation as-is.
> >
> > > IMHO, deprecating or removing the REST API and providing only a CLI,
> > > however, could make Nutch less likely to be used in professional
> > > integrations.  And therefore, maybe less likely to attract
> contributions.
> >
> > I agree to some degree. In reality the Nutch service is not production
> grade right now. I haven't heard about any production usage in a while
> (apart from your above reference) and I think the current implementation
> maybe does more damage than good when shipped with the current Nutch
> releases. It also adds (outdated) and unnecessary dependency bloat to the
> Nutch .job artifacts we submit to the Hadoop cluster.
> >
> > > That being said, I would not fight against option B.
> >
> > You got me thinking that there are maybe two sub-options for B.
> >
> > B(i) as per my original narrative
> >
> > B(ii) provide an OpenAPI specification (see my comments at
> https://github.com/apache/nutch/pull/896) and deprecate the existing
> service for removal in the next version of Nutch. If the OpenAPI exists
> then theoretically ANYONE can come along and generate their own server and
> client implementation in any language they want. The implementation just
> needs to know about NUTCH_HOME, etc. and be able to interface with Nutch
> classes.
> >
> > If we can. get enough peer review for the above PR then I would be in
> favor of option B(ii).
> >
> > Sorry for making this more complicated.
> >
>

Reply via email to