kfaraz commented on PR #13339:
URL: https://github.com/apache/druid/pull/13339#issuecomment-1382665846

   @churromorales , as @abhishekagarwal87 mentions, in a production 
environment, we should _always_ be prudent while using the default value of any 
config, especially one that dictates the usage of resources such as task slots. 
So I agree that in prod, no one should be using a task count of 1. I have not 
seen anyone do it in my experience either.
   
   The only concern I have with using a default value of `taskCount = 
numPartitions` is that we just might not have enough task slots. But I agree 
that Druid should use a better and more dynamic default out of the box.
   
   How about we do this:
   - If taskCount is specified, we use that
   - If not specified, we use `taskCount = Math.min(numPartitions, 
totalWorkerCapacity/2)`.
   
   This way, we ensure that ingestion always runs successfully and has a better 
default value than 1. I agree that this might lead to multiple partitions being 
mapped to a single task, but only in the case where your setup doesn't have 
enough task slots. In a prod environment, we would ideally have enough task 
slots. 
   (To make this better, you may even choose a taskCount such that 
numPartitions is a multiple of it, so that each task gets the same number of 
partitions.)
   
   Let me know what you think.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to