kfaraz commented on PR #13339: URL: https://github.com/apache/druid/pull/13339#issuecomment-1382665846
@churromorales , as @abhishekagarwal87 mentions, in a production environment, we should _always_ be prudent while using the default value of any config, especially one that dictates the usage of resources such as task slots. So I agree that in prod, no one should be using a task count of 1. I have not seen anyone do it in my experience either. The only concern I have with using a default value of `taskCount = numPartitions` is that we just might not have enough task slots. But I agree that Druid should use a better and more dynamic default out of the box. How about we do this: - If taskCount is specified, we use that - If not specified, we use `taskCount = Math.min(numPartitions, totalWorkerCapacity/2)`. This way, we ensure that ingestion always runs successfully and has a better default value than 1. I agree that this might lead to multiple partitions being mapped to a single task, but only in the case where your setup doesn't have enough task slots. In a prod environment, we would ideally have enough task slots. (To make this better, you may even choose a taskCount such that numPartitions is a multiple of it, so that each task gets the same number of partitions.) Let me know what you think. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
