phet opened a new pull request, #4089: URL: https://github.com/apache/gobblin/pull/4089
Dear Gobblin maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! ### JIRA - [ ] My PR addresses the following [Gobblin JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR" - https://issues.apache.org/jira/browse/GOBBLIN-2186 ### Description - [ ] Here are some details about my PR, including screenshots (if applicable): `GaaSJobObservabilityEvent`s for Gobblin-on-Temporal jobs presently have no values set for the fields `jobPlanningStartTimestamp` and `jobPlanningEndTimestamp`. This is because, in contrast with Gobblin-on-MR, `GenerateWorkUnitsImpl` emits no `TimingEvent.LauncherTimings.WORK_UNITS_CREATION` GTE to record such values. The historical impediment was that Temporal does not permit Activities to launch other Activities (only Workflows may do that). `TemporalEventTimer`, however, emitted its GTE on a separate Activity for reliability. This change reworks `TemporalEventTimer.Factory` to be useable within an `Activity`, by refactoring into complementary concrete realizations - `TemporalEventTimer.WithinWorkflowFactory` and `TemporalEventTimer.WithinActivityFactory`. The latter sets up direct GTE emission within the same (current) `Activity`. The alternative of reworking `GenerateWorkUnitsImpl` from an `Activity` into a `Workflow` would be challenging to accomplish, because "Work Discovery" may be quite long running \- 15-30 mins is not uncommon for large full-initial-copy Iceberg-Distcp jobs \- yet Temporal [allows a workflow-task to run for a maximum of 2 minutes](https://docs.temporal.io/encyclopedia/detecting-workflow-failures#workflow-task-timeout). (Since we wish to emit this timing GTE *in the midst of* Work Discovery, it would not be easy to have one sub-activity first performing all of Work Discovery and another subsequently doing GTE emission once the first concludes.) In addition, now that this refactoring happily unblocks GTE-emission from `GenerateWorkUnitsImpl` we were primed to emit a new GTE, which is actually even more critical to emit *in the midst of* Work Discovery: one to record the volume of `WorkUnit`s discovered. To be useful, this must be done prior to WU serialization, which is memory-intensive and may OOM. If that should happen this GTE's durable measurement would inform re-configuration for any subsequent re-attempt. Accordingly, we preserve total size and counts, means, and medians of both original and bin packed WUs. This was inspired by a pair of log messages produced within `CopySource::getWorkUnits`: ``` Statistics for ConcurrentBoundedPriorityIterable: {ResourcePool: {softBound: [ ... ], hardBound: [ ...]},totalResourcesUsed: [ ... ], maxRequirementPerDimension: [entities: 231943.0, bytesCopied: 1.22419622769628E14], ... } ``` and ``` org.apache.gobblin.data.management.copy.CopySource - Bin packed work units. Initial work units: 27252, packed work units: 13175, max weight per bin: 500000000, max work units per bin: 100. ``` We often seek out these messages to diagnose and resolve failures during during `WorkUnit` Discovery, but rather than merely logging, this durable emission sets up machine-to-machine auto-right-sizing to orchestrate a potential re-attempt (upon WU serialization OOM). ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: manual execution ### Commits - [ ] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@gobblin.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org