aho135 commented on PR #19191: URL: https://github.com/apache/druid/pull/19191#issuecomment-4218297071
> Thanks @aho135 for the feature! I’ve left some high-level comments. I’m slightly wary of having the supervisor manage unsupervised backfill tasks in the background (similar to the limitations like task failures/resiliency/reporting you already called out in the PR summary). > > As an alternative design, I wonder if it would be cleaner to leverage multi-supervisor support and spin up a new backfill supervisor via an API. For example, the high-level API could compute the offset ranges (similar to what’s done in this PR), reset the running supervisor, and then submit a new backfill supervisor based on the active supervisor spec with a constructed `transformSpec` (similar to #18757 for reading topic partition-offset ranges). Another option could be introducing a separate `ioConfig` for a backfill supervisor, although that might be more involved and potentially overkill. > > This would keep everything “supervised” and naturally address some of the limitations in the current design around task supervision. It would also provide a clear separation between the running and backfill supervisors from an operational standpoint. > > What are your thoughts? Thanks for the review @abhishekrb19! > As an alternative design, I wonder if it would be cleaner to leverage multi-supervisor support and spin up a new backfill supervisor via an API I was debating the same, but felt that this doesn't really fit into the existing StreamSupervisor implementation for these reasons: - taskDuration doesn't make sense in the context of a backfill task since ideally we would just want it to run to completion - Supervisors are long running. With this kind of short-lived Supervisor I think this requires some additional logic in the SupervisorManager to track the tasks to completion and then terminate the Supervisor. - These differences make me think it'd be worthwhile to implement a new kind of Supervisor that can handle task tracking and handle termination. > constructed `transformSpec` (similar to #18757 for reading topic partition-offset ranges) I found the `transformSpec` approach a bit limiting because there we would be processing the entire range of data from earliest and dropping everything until we reach the needed offset. Wouldn't it be better to just directly use startSequenceNumbers/endSequenceNumbers and set partitionSequenceNumberMap like in this PR? Lmk if I'm missing something with the approach here though! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
