himanshug commented on pull request #10524:
URL: https://github.com/apache/druid/pull/10524#issuecomment-760533912
As I am going through the code, I notice, overall the patch does
following....
(1) periodically update lag statistics in the background
(2) periodically trigger the "autoscaling algorithm" to see if tasks need to
be scaled up/down
(3) if "autoscaling algorithm" figures out scale up/down is needed, then it
updates new taskCount in the spec in DB and
(4) then "trigger" the scale up/down
In current design, everything is embedded inside `SeekableStreamSupervisor`
. However, it looks like that (1), (2) and (3) can happen independently and
outside of `SeekableStreamSupervisor` to minimize putting more stuff inside
there... only potential change inside `SeekableStreamSupervisor` needed is to
make it accept a way to trigger task creation/deletion based on updated
taskCount in the spec.
Idea can be taken even a little further. You have used a specific
"autoscaling algorithm" based on heuristics provided by the user .. I can
totally imagine other version of "autoscaling algorithm" that has more
intelligence inside it to rely on less and less information from the user. I
can even see the need of being able to support multiple "autoscaling
algorithms" that user can pick-and-choose from so as to allow experimentation
of different "autoscaling algorithms" without impacting existing users.
In fact, a far away thought, users might want to have their own autoscaler
algorithms based on stats specific to their deployment instead of kafka topic
lag e.g. checking the load metrics of their pipeline which could trigger
autoscale way before things start lagging.
One quick line of thought I had while reading the code was that we can have
something like...
```
// add a new method to SupervisorSpec
interface SupervisorSpec {
...
SupervisorTaskAutoscaler createAutoscaler();
...
}
// start() implementation sets up necessary scheduler of whatever to collect
stats, run its own autoscaling algorithm logic
// then if needed, update taskCount in spec and send a signal to supervisor
to reconcile based on new taskCount
interface SupervisorTaskAutoscaler {
void start();
void stop();
}
// Update SupervisorManager to also manage autoscaler in various places as
needed i.e. introduce code like
SupervisorTaskAutoscaler autoscaler = spec.createAutoscaler();
if (autoscaler != null) {
autoscaler.start() or stop()
}
// Now you can have mulitple implementations of SupervisorTaskAutoscaler that
// could do things there own way, we can even make it extensible
```
it would be great if is feasible to re-arrange this PR into something like
above. I am sure I might have missed some important detail, but wanted to note
my ideas here to know what you/others think about that.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]