himanshug commented on pull request #10524:
URL: https://github.com/apache/druid/pull/10524#issuecomment-760533912


   As I am going through the code, I notice, overall the patch does 
following....
   
   (1) periodically update lag statistics in the background
   (2) periodically trigger the "autoscaling algorithm" to see if tasks need to 
be scaled up/down
   (3) if "autoscaling algorithm" figures out scale up/down is needed, then it 
updates new taskCount in the spec in DB and
   (4) then "trigger" the scale up/down
   
   In current design, everything is embedded inside `SeekableStreamSupervisor` 
. However, it looks like that (1), (2) and (3) can happen independently and 
outside of `SeekableStreamSupervisor` to minimize putting more stuff inside 
there... only potential change inside `SeekableStreamSupervisor` needed is to 
make it accept a way to trigger task creation/deletion based on updated 
taskCount in the spec. 
   
   Idea can be taken even a little further. You have used a specific 
"autoscaling algorithm" based on heuristics provided by the user .. I can 
totally imagine other version of "autoscaling algorithm" that has more 
intelligence inside it to rely on less and less information from the user. I 
can even see the need of being able to support multiple "autoscaling 
algorithms" that user can pick-and-choose from so as to allow experimentation 
of different "autoscaling algorithms" without impacting existing users.
   In fact, a far away thought, users might want to have their own autoscaler 
algorithms based on stats specific to their deployment instead of kafka topic 
lag e.g. checking the load metrics of their pipeline which could trigger 
autoscale way before things start lagging.
   
   One quick line of thought I had while reading the code was that we can have 
something like...
   
   ```
   // add a new method to SupervisorSpec
   interface SupervisorSpec {
   
   ...
     SupervisorTaskAutoscaler createAutoscaler();
   ...
   }
   
   // start() implementation sets up necessary scheduler of whatever to collect 
stats, run its own autoscaling algorithm logic
   // then if needed, update taskCount in spec and send a signal to supervisor 
to reconcile based on new taskCount
   interface SupervisorTaskAutoscaler {
     void start();
     void stop();
   }
   
   
   // Update SupervisorManager to also manage autoscaler in various places as 
needed i.e. introduce code like
   SupervisorTaskAutoscaler autoscaler = spec.createAutoscaler();
   if (autoscaler != null) {
     autoscaler.start() or stop() 
   }
   
   
   
   // Now you can have mulitple implementations of SupervisorTaskAutoscaler that
   // could do things there own way, we can even make it extensible
   
   ```
   
   it would be great if is feasible to re-arrange this PR into something like 
above. I am sure I might have missed some important detail, but wanted to note 
my ideas here to know what you/others think about that.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

Reply via email to