[ https://issues.apache.org/jira/browse/GOBBLIN-2186?focusedWorklogId=950623&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-950623 ]
ASF GitHub Bot logged work on GOBBLIN-2186: ------------------------------------------- Author: ASF GitHub Bot Created on: 02/Jan/25 05:34 Start Date: 02/Jan/25 05:34 Worklog Time Spent: 10m Work Description: phet commented on code in PR #4089: URL: https://github.com/apache/gobblin/pull/4089#discussion_r1900542681 ########## gobblin-temporal/src/main/java/org/apache/gobblin/temporal/ddm/activity/impl/GenerateWorkUnitsImpl.java: ########## @@ -150,26 +156,28 @@ public GenerateWorkUnitsResult generateWorkUnits(Properties jobProps, EventSubmi protected List<WorkUnit> generateWorkUnitsForJobStateAndCollectCleanupPaths(JobState jobState, EventSubmitterContext eventSubmitterContext, Closer closer, Set<String> pathsToCleanUp) throws ReflectiveOperationException { + // report (timer) metrics for "Work Discovery", *planning only* - NOT including WU prep, like serialization, `DestinationDatasetHandlerService`ing, etc. + // IMPORTANT: for accurate timing, SEPARATELY emit `.createWorkPreparationTimer`, to record time prior to measuring the WU size required for that one Review Comment: originally, in `AbstractJobLauncher` the "WU creation timer" measured only the planning - https://github.com/apache/gobblin/blob/7dbeebf7fecc748ea3ef90cc318214cf26ba5afa/gobblin-runtime/src/main/java/org/apache/gobblin/runtime/AbstractJobLauncher.java#L476 that is what's included in the `GaaSJobObservabilityEvent`. the timer for WU prep happens a bit later - https://github.com/apache/gobblin/blob/7dbeebf7fecc748ea3ef90cc318214cf26ba5afa/gobblin-runtime/src/main/java/org/apache/gobblin/runtime/AbstractJobLauncher.java#L549 so in this comment: > "Work Discovery", *planning only* - NOT including WU prep, like serialization, ... I just meant that we're timing only planning/creation, not the preparation such as serialization. as for WU serialization, there is no existing, historical event strictly for that. typically that only takes a long time when memory-constrained and GC-bound. although we could consider adding a new event to time that, for purposes of right-sizing, GC stats are more interesting than the duration it happens to take. if anything, the former is what I'd prioritize. Issue Time Tracking ------------------- Worklog Id: (was: 950623) Time Spent: 1h (was: 50m) > Ensure GoT jobs record Work Discovery planning timing for populating the > `GaaSJobObservabilityEvent` fields `jobPlanning{Start,End}Timestamp` > --------------------------------------------------------------------------------------------------------------------------------------------- > > Key: GOBBLIN-2186 > URL: https://issues.apache.org/jira/browse/GOBBLIN-2186 > Project: Apache Gobblin > Issue Type: New Feature > Components: gobblin-core > Reporter: Kip Kohn > Assignee: Abhishek Tiwari > Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > `GaaSJobObservabilityEvent`s for Gobblin-on-Temporal jobs have no values set > for the fields `jobPlanningStartTimestamp` and `jobPlanningEndTimestamp` > because no `TimingEvent.LauncherTimings.WORK_UNITS_CREATION` GTE (to record > those values) is emitted by `GenerateWorkUnitsImpl` -- This message was sent by Atlassian Jira (v8.20.10#820010)