gianm commented on issue #13936: URL: https://github.com/apache/druid/issues/13936#issuecomment-1777699171
From reading the code, as far as I can tell, the state is not persisted, and it isn't set back up on supervisor restart until `inactiveAfterMillis` has elapsed. I can think of a couple options for addressing this. One is that we could treat `idle` like `suspended`, by putting it in the supervisor spec and persisting it. However this seems weird to me, since it would involve the supervisor needing to edit its own spec once it enters idle state, and typically supervisors don't edit their own specs in the metadata store. So this wouldn't be my preferred choice. Another, IMO better approach is to adjust the logic for supervisor startup to get into the idle state immediately if that makes sense. Logic could be something like: if committed offsets match the current stream offsets, and no tasks are running, go into idle immediately. I think we check the current stream offsets once a minute, so that means the main risk is that we'll launch tasks up to a minute late in the case where the offsets did match, but then a new message comes in right after the supervisor starts up. But that should be rare, and if it does happen, it means messages are coming in quite infrequently. So it's probably fine to err on the side of staying idle for up to a minute. Thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
