kfaraz commented on PR #17509: URL: https://github.com/apache/druid/pull/17509#issuecomment-2499689770
@georgew5656, this probably doesn't qualify as a bug per se. It is a quality of life change but there is no incorrect behaviour here. IIUC, this PR tries to: - eagerly identify the conditions which result in the messages below. - fail the commit in a retryable manner rather than failing the whole task. > [Case A] The new start metadata state[%s] is ahead of the last committed" + " end state[%s]. Try resetting the supervisor > [Case B] Inconsistency between stored metadata state[%s] and target state[%s]. Try resetting the supervisor. If this is the case, we probably don't even need the new method `canPublishSegments`. We just need to throw the right kind of exception for the above mentioned messages which can notify the task that it needs to retry. Or maybe the task action itself could retry. (something `segmentAllocate` action already does if the used segments seem to have changed). However, if we do decide to have the `canPublishSegments` method, it should have the same logic that is used while validating the commit i.e. the same logic that ends up throwing the "inconsistent metadata" errors. #### Alternatives Perhaps, we could just lock/queue on the (supervisor + groupId) combo? Publish should typically not take a long time, and failing every time some other task in the same group is already publishing seems like weird behaviour. The publish request coming from the second task could just wait while the first one finishes. Let me know what you think. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
