dpol1 commented on PR #1973: URL: https://github.com/apache/stormcrawler/pull/1973#issuecomment-4886038733
Checked all seven against the code, the robot did its homework. Replies inline. > So `fetch.statusCode` and `<prefix>retry-after` are stripped before the queue-stream emit unless both are added to `metadata.persist`. Confirmed, my tests hand-crafted the metadata so they never hit the filter. I'll document the two `metadata.persist` entries and make `prepare()` warn when they don't survive the filter. Emitting pre-filter metadata changes the #1974 contract in core, @jnioche's call. > extracting it (e.g. into `ManagedChannelUtil`) and reusing it here would fix the bug and delete two copies. Bug confirmed, fixing. Not deleting the copies though: Spout and updater validate differently on multi-node (== vs multiple), and the bolt sends `local=false` so any node works. It'll just resolve `urlfrontier.address` itself. > the addition wants `Math.addExact` + clamp too. Yep, taking it. > Cheap guard: skip the block for the sentinel key. Taking it. > drop `waitForReady` or add a short `withDeadlineAfter` Adding the deadline. The per-key cache I'd leave for the adaptive back-off follow-up (#867 phase 2), duplicate blocks are idempotent frontier-side anyway. > Asserting the URL is *never* handed out (first call empty) would test the block Right, the current assertion is vacuous once the URL goes in-flight. Inverting: block first, then seed. > A protected helper in the abstract class next to the `declareStream` call would keep the contract uniform. Follow-up material for me, not touching core + OpenSearch for two call sites. Happy to do it when a third consumer shows up. > formatting the test inputs with the JDK's `RFC_1123_DATE_TIME` would make the test an actual cross-check. Take it easy Claude: it prints single-digit days unpadded, which the strict `dd` pattern rejects on purpose. Swap would break for a third of the month. I'll use it for two-digit days and add a rejection test for the unpadded form. Fixes incoming. No rush on the human review, the robot already gave me enough homework 😄 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
