khandelwal-prateek opened a new pull request, #4106:
URL: https://github.com/apache/gobblin/pull/4106

   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I 
have checked off all the steps below!
   
   
   ### JIRA
   - [x] My PR addresses the following [Gobblin 
JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references 
them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
       - https://issues.apache.org/jira/browse/GOBBLIN-2199
   
   
   ### Description
   - [x] Here are some details about my PR, including screenshots (if 
applicable):
   - Moves activity parallelism in container at worker level only and using 1 
worker per container as default.
   - Set dynamic scaling as disabled by default. Added a config to 
enable/disable dynamic scaling (which can be changed at job/template level).
   - Added configs to set container throughput and max WUs for a container, 
which is used for computing container count.
   - This change introduces a major refactoring of the calcDerivationSetPoint 
method in the RecommendScalingForWorkUnitsLinearHeuristicImpl class. The 
changes improve the calculation of the recommended number of containers for 
dynamic scaling by separately considering two constraints:
     - **Throughput Constraint:** 
       - Computes the total bytes to be processed based on the top-level work 
units (MWUs) and their average size.
       - Determines the container’s processing rate using the amortized 
throughput per thread and container capacity (derived from the number of 
workers per container and threads per worker).
       - Estimates the total container minutes required and calculates the 
number of containers needed to meet the job's time budget.
     - **Work Unit Constraint:**
       - Calculates the required number of containers based on the total number 
of constituent work units and the maximum number of work units allowed per 
container.
   
     - The method then returns the maximum of these two calculated values to 
ensure that both constraints are met. 
   
   
   
   ### Tests
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   
   ### Commits
   - [x] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
       1. Subject is separated from body by a blank line
       2. Subject is limited to 50 characters
       3. Subject does not end with a period
       4. Subject uses the imperative mood ("add", not "adding")
       5. Body wraps at 72 characters
       6. Body explains "what" and "why", not "how"
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@gobblin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to