kaisun2000 commented on PR #15373: URL: https://github.com/apache/druid/pull/15373#issuecomment-1819642612
@abhishekagarwal87, The fix of #15260 from @gianm is in the query path to check the potential NullPtr pointer condition and then avoid them by returning to missing segment error to the broker. The broke seeing the missing segment would retry. My fix is in the ingestion path to add a configurable delay in the segment handoff phase. More specifically after the peon gets the acknowledgement from the coordinator that the segments in this handoff batch are already placed in some historicals, the peon would wait for some time before the peon removes the segments from its timeline, resets the hydrants and drops the segments and the files backing the segments up in the file system. Actually as @gianm [pointed it out](https://apachedruidworkspace.slack.com/archives/C0309C9L90D/p1699584076521929?thread_ts=1698263651.554789&cid=C0309C9L90D), this pattern of racing was not just in the peon side, it is also in the historical data segment moving side. And the approach taken in historical side is also adding a configurable delay. Thus adding a configurable delay is a "_good idea_" in @gianm's own words. So I would say the two approaches complements each other. See the details of discussion in Apache slack channel [here](https://apachedruidworkspace.slack.com/archives/C0309C9L90D/p1699584076521929?thread_ts=1698263651.554789&cid=C0309C9L90D) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
