khandelwal-prateek opened a new pull request, #4106: URL: https://github.com/apache/gobblin/pull/4106
Dear Gobblin maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! ### JIRA - [x] My PR addresses the following [Gobblin JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR" - https://issues.apache.org/jira/browse/GOBBLIN-2199 ### Description - [x] Here are some details about my PR, including screenshots (if applicable): - Moves activity parallelism in container at worker level only and using 1 worker per container as default. - Set dynamic scaling as disabled by default. Added a config to enable/disable dynamic scaling (which can be changed at job/template level). - Added configs to set container throughput and max WUs for a container, which is used for computing container count. - This change introduces a major refactoring of the calcDerivationSetPoint method in the RecommendScalingForWorkUnitsLinearHeuristicImpl class. The changes improve the calculation of the recommended number of containers for dynamic scaling by separately considering two constraints: - **Throughput Constraint:** - Computes the total bytes to be processed based on the top-level work units (MWUs) and their average size. - Determines the container’s processing rate using the amortized throughput per thread and container capacity (derived from the number of workers per container and threads per worker). - Estimates the total container minutes required and calculates the number of containers needed to meet the job's time budget. - **Work Unit Constraint:** - Calculates the required number of containers based on the total number of constituent work units and the maximum number of work units allowed per container. - The method then returns the maximum of these two calculated values to ensure that both constraints are met. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [x] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@gobblin.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org