churromorales opened a new issue, #12892:
URL: https://github.com/apache/druid/issues/12892

   ### Motivation
   
   Right now we need to know the number of task slots to allocate in advance
   to configure middle managers appropriately. We could leverage the k8s 
scheduler to do this to make configurations easier for customers. We could have 
an upper limit on concurrent tasks, but this would allow the system to become 
more elastic as when those slots are not being used, those Kubernetes resources 
originally allocated for the middle manager would be freed up.
   
   
   
   ### Proposed changes
   
   We need to reduce the shared dependency of the middle managers file system 
for the peon tasks. It is difficult to share files dynamically between a 
kubernetes container and the application launching it. ConfigMap files do exist 
for this purpose but there is currently no lifecycle management for a ConfigMap 
where after a job terminates the ConfigMap is deleted. Additionally many k8s 
environments have a quota on the number of ConfigMaps one can use. Additionally 
there are size limitations for how big ConfigMaps can be.
   
   *We need to resolve the following:* 
   
   1. The TaskLogPusher currently pushes the reports.json file from the middle 
manager, this needs to move into the task itself.  The task running in the k8s 
job cannot call back to the middle manager anymore. 
   2. Saving tasks: (only happen in k8s mode)
       1. Whenever we create intermediate persist segment files, we should also 
push that to deep storage in a directory specific to the task itself. 
       2. When the middle manager writes a restore.json file, that should also 
be pushed to deep storage (when in k8s mode). 
   3. Restoring tasks: (only happens in k8s mode)
       1. When a task starts up, the first thing it does is pull down the 
intermediate persist files and the restore.json file to a local volume in its 
own task dir. That way when it starts it behaves the same way it would as if it 
were running in non-k8s mode. 
   4. If the task itself can push segments, I don’t think it is unreasonable 
for it to push other relevant data.
   
   
   
   ### Rationale
   
   We would still leverage the middle manager for the first pass. This 
hopefully paves the way for one day removing the MiddleManager and having the 
overlord itself launch these tasks as k8s jobs. For the first pass due to how 
tightly coupled the filesystem is between the middle manager and peon tasks, we 
want to propose something that is a stepping stone in getting us to a middle 
manager-less world. 
   
   This patch has been proposed by someone in the community: 
https://github.com/apache/druid/pull/10910
   I reviewed this patch and I don’t believe it handles any sort of 
checkpointing.  I also believe it ignores the task report as well. I believe 
keeping tasks as k8s jobs instead of pods, will allow the k8s scheduler to 
handle the lifecycle better as well.  Right now with the pod based approach, if 
the middle manager unexpectedly dies, those pods are not cleaned up.  While we 
can utilize some of the work, I think some key features are missing from the 
way Druid currently works.  
   
   
   ### Operational impact
   Should be none, there will be one or two configuration options to launch 
tasks as k8s jobs. 
   
   ### Future work
   Remove the middle manager dependency completely.  Right now the filesystem 
is tightly coupled between the middle manager and tasks, once we can launch 
tasks from a middle manager successfully we can work on removing the middle 
manager altogether. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to