dpol1 commented on PR #1973:
URL: https://github.com/apache/stormcrawler/pull/1973#issuecomment-4886038733

   Checked all seven against the code, the robot did its homework. Replies 
inline.
   
   > So `fetch.statusCode` and `<prefix>retry-after` are stripped before the 
queue-stream emit unless both are added to `metadata.persist`.
   
   Confirmed, my tests hand-crafted the metadata so they never hit the filter. 
I'll document the two `metadata.persist` entries and make `prepare()` warn when 
they don't survive the filter. Emitting pre-filter metadata changes the #1974 
contract in core, @jnioche's call.
   
   > extracting it (e.g. into `ManagedChannelUtil`) and reusing it here would 
fix the bug and delete two copies.
   
   Bug confirmed, fixing. Not deleting the copies though: Spout and updater 
validate differently on multi-node (== vs multiple), and the bolt sends 
`local=false` so any node works. It'll just resolve `urlfrontier.address` 
itself.
   
   > the addition wants `Math.addExact` + clamp too.
   
   Yep, taking it.
   
   > Cheap guard: skip the block for the sentinel key.
   
   Taking it.
   
   > drop `waitForReady` or add a short `withDeadlineAfter`
   
   Adding the deadline. The per-key cache I'd leave for the adaptive back-off 
follow-up (#867 phase 2), duplicate blocks are idempotent frontier-side anyway.
   
   > Asserting the URL is *never* handed out (first call empty) would test the 
block
   
   Right, the current assertion is vacuous once the URL goes in-flight. 
Inverting: block first, then seed.
   
   > A protected helper in the abstract class next to the `declareStream` call 
would keep the contract uniform.
   
   Follow-up material for me, not touching core + OpenSearch for two call 
sites. Happy to do it when a third consumer shows up.
   
   > formatting the test inputs with the JDK's `RFC_1123_DATE_TIME` would make 
the test an actual cross-check.
   
   Take it easy Claude: it prints single-digit days unpadded, which the strict 
`dd` pattern rejects on purpose. Swap would break for a third of the month. 
I'll use it for two-digit days and add a rejection test for the unpadded form.
   
   Fixes incoming. No rush on the human review, the robot already gave me 
enough homework 😄


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to