[ https://issues.apache.org/jira/browse/GOBBLIN-2186?focusedWorklogId=950610&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-950610 ]
ASF GitHub Bot logged work on GOBBLIN-2186: ------------------------------------------- Author: ASF GitHub Bot Created on: 01/Jan/25 16:19 Start Date: 01/Jan/25 16:19 Worklog Time Spent: 10m Work Description: phet opened a new pull request, #4089: URL: https://github.com/apache/gobblin/pull/4089 Dear Gobblin maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! ### JIRA - [ ] My PR addresses the following [Gobblin JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR" - https://issues.apache.org/jira/browse/GOBBLIN-2186 ### Description - [ ] Here are some details about my PR, including screenshots (if applicable): `GaaSJobObservabilityEvent`s for Gobblin-on-Temporal jobs presently have no values set for the fields `jobPlanningStartTimestamp` and `jobPlanningEndTimestamp`. This is because, in contrast with Gobblin-on-MR, `GenerateWorkUnitsImpl` emits no `TimingEvent.LauncherTimings.WORK_UNITS_CREATION` GTE to record such values. The historical impediment was that Temporal does not permit Activities to launch other Activities (only Workflows may do that). `TemporalEventTimer`, however, emitted its GTE on a separate Activity for reliability. This change reworks `TemporalEventTimer.Factory` to be useable within an `Activity`, by refactoring into complementary concrete realizations - `TemporalEventTimer.WithinWorkflowFactory` and `TemporalEventTimer.WithinActivityFactory`. The latter sets up direct GTE emission within the same (current) `Activity`. The alternative of reworking `GenerateWorkUnitsImpl` from an `Activity` into a `Workflow` would be challenging to accomplish, because "Work Discovery" may be quite long running \- 15-30 mins is not uncommon for large full-initial-copy Iceberg-Distcp jobs \- yet Temporal [allows a workflow-task to run for a maximum of 2 minutes](https://docs.temporal.io/encyclopedia/detecting-workflow-failures#workflow-task-timeout). (Since we wish to emit this timing GTE *in the midst of* Work Discovery, it would not be easy to have one sub-activity first performing all of Work Discovery and another subsequently doing GTE emission once the first concludes.) In addition, now that this refactoring happily unblocks GTE-emission from `GenerateWorkUnitsImpl` we were primed to emit a new GTE, which is actually even more critical to emit *in the midst of* Work Discovery: one to record the volume of `WorkUnit`s discovered. To be useful, this must be done prior to WU serialization, which is memory-intensive and may OOM. If that should happen this GTE's durable measurement would inform re-configuration for any subsequent re-attempt. Accordingly, we preserve total size and counts, means, and medians of both original and bin packed WUs. This was inspired by a pair of log messages produced within `CopySource::getWorkUnits`: ``` Statistics for ConcurrentBoundedPriorityIterable: {ResourcePool: {softBound: [ ... ], hardBound: [ ...]},totalResourcesUsed: [ ... ], maxRequirementPerDimension: [entities: 231943.0, bytesCopied: 1.22419622769628E14], ... } ``` and ``` org.apache.gobblin.data.management.copy.CopySource - Bin packed work units. Initial work units: 27252, packed work units: 13175, max weight per bin: 500000000, max work units per bin: 100. ``` We often seek out these messages to diagnose and resolve failures during during `WorkUnit` Discovery, but rather than merely logging, this durable emission sets up machine-to-machine auto-right-sizing to orchestrate a potential re-attempt (upon WU serialization OOM). ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: manual execution ### Commits - [ ] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" Issue Time Tracking ------------------- Worklog Id: (was: 950610) Remaining Estimate: 0h Time Spent: 10m > Ensure GoT jobs record Work Discovery planning timing for populating the > `GaaSJobObservabilityEvent` fields `jobPlanning{Start,End}Timestamp` > --------------------------------------------------------------------------------------------------------------------------------------------- > > Key: GOBBLIN-2186 > URL: https://issues.apache.org/jira/browse/GOBBLIN-2186 > Project: Apache Gobblin > Issue Type: New Feature > Components: gobblin-core > Reporter: Kip Kohn > Assignee: Abhishek Tiwari > Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > `GaaSJobObservabilityEvent`s for Gobblin-on-Temporal jobs have no values set > for the fields `jobPlanningStartTimestamp` and `jobPlanningEndTimestamp` > because no `TimingEvent.LauncherTimings.WORK_UNITS_CREATION` GTE (to record > those values) is emitted by `GenerateWorkUnitsImpl` -- This message was sent by Atlassian Jira (v8.20.10#820010)