Re: [PR] #784 - Support Retry-After in FetcherBolt [stormcrawler]

via GitHub Tue, 16 Jun 2026 03:51:05 -0700


jnioche commented on PR #1944:
URL: https://github.com/apache/stormcrawler/pull/1944#issuecomment-4717937566


   yes, the split makes sense
   
   > One thing I'd like your take on: should the re-emitted URLs go out as 
`Status.ERROR` (reuses the existing path, but carries error/retry semantics), 
or should we set an explicit future `nextFetchDate` so the scheduler honors the 
exact back-off?
   
   `Status.ERROR` is not the right status: it indicates an irremediable problem 
with the content of the document, like a pdf that would be unparsable for 
instance or a URL blocked by robots.txt
   
   Could set an explicit `nextFetchDate` but I think just mimicking what is 
done via `crawl-delay-too-long` would be good enough.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] #784 - Support Retry-After in FetcherBolt [stormcrawler]

Reply via email to