Hi Isabelle,

On 2026/02/13 16:53:29 Isabelle Giguere wrote:

> I'm aware of one integration that uses the REST API.  I'm not working on
> that application anymore, so I have no idea if the web crawler remains an
> important feature today, or if it would remain part of the product if the
> REST API disappears.

OK

> I would add one more concern to the list:
> NutchServer method start() hard-codes the protocol "http", and there is no
> way to configure NutchServer to start on https, even if the protocol was
> not hard-coded, and the private constructor makes it impossible to extend
> NutchServer to fix the issue.

Excellent observation. Yet one more nail in the coffin for the current 
implementation as-is. 

> IMHO, deprecating or removing the REST API and providing only a CLI,
> however, could make Nutch less likely to be used in professional
> integrations.  And therefore, maybe less likely to attract contributions.

I agree to some degree. In reality the Nutch service is not production grade 
right now. I haven't heard about any production usage in a while (apart from 
your above reference) and I think the current implementation maybe does more 
damage than good when shipped with the current Nutch releases. It also adds 
(outdated) and unnecessary dependency bloat to the Nutch .job artifacts we 
submit to the Hadoop cluster.

> That being said, I would not fight against option B.

You got me thinking that there are maybe two sub-options for B. 

B(i) as per my original narrative 

B(ii) provide an OpenAPI specification (see my comments at 
https://github.com/apache/nutch/pull/896) and deprecate the existing service 
for removal in the next version of Nutch. If the OpenAPI exists then 
theoretically ANYONE can come along and generate their own server and client 
implementation in any language they want. The implementation just needs to know 
about NUTCH_HOME, etc. and be able to interface with Nutch classes.

If we can. get enough peer review for the above PR then I would be in favor of 
option B(ii).

Sorry for making this more complicated.

Reply via email to