[ https://issues.apache.org/jira/browse/GOBBLIN-2185?focusedWorklogId=949926&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-949926 ]
ASF GitHub Bot logged work on GOBBLIN-2185: ------------------------------------------- Author: ASF GitHub Bot Created on: 24/Dec/24 06:03 Start Date: 24/Dec/24 06:03 Worklog Time Spent: 10m Work Description: phet commented on code in PR #4087: URL: https://github.com/apache/gobblin/pull/4087#discussion_r1896430362 ########## gobblin-temporal/src/main/java/org/apache/gobblin/temporal/ddm/activity/impl/RecommendScalingForWorkUnitsLinearHeuristicImpl.java: ########## @@ -27,16 +27,22 @@ /** - * Simple config-driven linear relationship between `remainingWork` and the resulting `setPoint` + * Simple config-driven linear recommendation for how many containers to use to complete the "remaining work" within a given {@link TimeBudget}, per: * - * - * TODO: describe algo!!!!! + * a. from {@link WorkUnitsSizeSummary}, find how many (remaining) "top-level" {@link org.apache.gobblin.source.workunit.MultiWorkUnit}s of some mean size + * b. from the configured {@link #AMORTIZED_NUM_BYTES_PER_MINUTE}, find the expected "processing rate" in bytes / minute + * 1. estimate the time required for processing a mean-sized `MultiWorkUnit` (MWU) + * c. from {@link JobState}, find per-container `MultiWorkUnit` parallelism capacity (aka. "worker-slots") to base the recommendation upon + * 2. calculate the per-container throughput of MWUs per minute + * 3. estimate the total per-container-minutes required to process all MWUs + * d. from the {@link TimeBudget}, find the target number of minutes in which to complete processing of all MWUs + * 4. recommend the number of containers so all MWU processing should finish within the target number of minutes Review Comment: no, the input parameterization is lettered and the algo calculations are numbered Issue Time Tracking ------------------- Worklog Id: (was: 949926) Time Spent: 0.5h (was: 20m) > Implement heuristic-based GoT Dynamic Auto-Scaling > -------------------------------------------------- > > Key: GOBBLIN-2185 > URL: https://issues.apache.org/jira/browse/GOBBLIN-2185 > Project: Apache Gobblin > Issue Type: New Feature > Components: gobblin-core > Reporter: Kip Kohn > Assignee: Abhishek Tiwari > Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Using a configured (constant) Data Transfer Rate (in bytes per time), presume > a linear relationship holds between "Work" (WU) throughput and scaling the > number of worker-containers. Provide a heuristic-based recommendation for > how many worker-containers to allocate in order to complete processing of a > job within a given time budget, with volume of Work conveyed via > `WorkUnitsSizeSummary` -- This message was sent by Atlassian Jira (v8.20.10#820010)