Thanks for the feedback.

I think option B(ii) sounds good.

I'll have a look at the PR.

Isabelle Giguère

Le sam. 14 févr. 2026 à 19:32, Joe Gilvary <[email protected]> a écrit :

> I've never used it, so I'm thinking the B(ii) sounds best, unless we hear
> more from current users. I'd prefer the compile-time flag option, and
> unless we know at some point that no one uses it, some hand-holding
> tutorial content for it.
>
> I've worked local all along so I defer to those with experience on the
> Hadoop concerns.
>
>  Thanks, stay safe, stay healthy,
>
>  Joe
> On 2/13/26 21:08, BlackIce wrote:
>
> I think that there is no real reason to keep that around, so I say option
> B.
>
> On Sat, Feb 14, 2026 at 2:53 AM Lewis John McGibbney <[email protected]>
> wrote:
>
>> I didn't finish my thoughts for B(ii)
>>
>> B(ii)
>> - provide an OpenAPI specification (see my comments at
>> https://github.com/apache/nutch/pull/896) and address the errors and
>> warnings
>> - immediately deprecate the existing service and note it is for
>> non-production use.
>> - a further issue concerns whether we ship the openapi.yaml (and
>> generated server implementation) with the Nutch codebase or separate it out
>> to a new repository. If we were to keep it with the existing repository,
>> then I would opt for some build flag which would allow the creation of
>> artifacts with or without Nutch service included. A compile-time flag of
>> sorts.
>>
>> Zooming out, if the OpenAPI exists then theoretically ANYONE can come
>> along and generate their own server and client implementation in any
>> language they want. The implementation just needs to know about NUTCH_HOME,
>> etc. and be able to interface with Nutch classes.
>>
>> One further thought I had is that the Nutch service should theoretically
>> be able to interface with a Hadoop cluster in order to fetch (and cache)
>> state rather than persist state in the running service process. I believe
>> this is another shortcoming of the current implementation. Finally (for
>> now) no authentication layer exists with the target Hadoop cluster...
>>
>> On 2026/02/14 01:29:31 Lewis John McGibbney wrote:
>> > Hi Isabelle,
>> >
>> > On 2026/02/13 16:53:29 Isabelle Giguere wrote:
>> >
>> > > I'm aware of one integration that uses the REST API.  I'm not working
>> on
>> > > that application anymore, so I have no idea if the web crawler
>> remains an
>> > > important feature today, or if it would remain part of the product if
>> the
>> > > REST API disappears.
>> >
>> > OK
>> >
>> > > I would add one more concern to the list:
>> > > NutchServer method start() hard-codes the protocol "http", and there
>> is no
>> > > way to configure NutchServer to start on https, even if the protocol
>> was
>> > > not hard-coded, and the private constructor makes it impossible to
>> extend
>> > > NutchServer to fix the issue.
>> >
>> > Excellent observation. Yet one more nail in the coffin for the current
>> implementation as-is.
>> >
>> > > IMHO, deprecating or removing the REST API and providing only a CLI,
>> > > however, could make Nutch less likely to be used in professional
>> > > integrations.  And therefore, maybe less likely to attract
>> contributions.
>> >
>> > I agree to some degree. In reality the Nutch service is not production
>> grade right now. I haven't heard about any production usage in a while
>> (apart from your above reference) and I think the current implementation
>> maybe does more damage than good when shipped with the current Nutch
>> releases. It also adds (outdated) and unnecessary dependency bloat to the
>> Nutch .job artifacts we submit to the Hadoop cluster.
>> >
>> > > That being said, I would not fight against option B.
>> >
>> > You got me thinking that there are maybe two sub-options for B.
>> >
>> > B(i) as per my original narrative
>> >
>> > B(ii) provide an OpenAPI specification (see my comments at
>> https://github.com/apache/nutch/pull/896) and deprecate the existing
>> service for removal in the next version of Nutch. If the OpenAPI exists
>> then theoretically ANYONE can come along and generate their own server and
>> client implementation in any language they want. The implementation just
>> needs to know about NUTCH_HOME, etc. and be able to interface with Nutch
>> classes.
>> >
>> > If we can. get enough peer review for the above PR then I would be in
>> favor of option B(ii).
>> >
>> > Sorry for making this more complicated.
>> >
>>
>

Reply via email to