phet opened a new pull request, #4089:
URL: https://github.com/apache/gobblin/pull/4089

   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I 
have checked off all the steps below!
   
   
   ### JIRA
   - [ ] My PR addresses the following [Gobblin 
JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references 
them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
       - https://issues.apache.org/jira/browse/GOBBLIN-2186
   
   
   ### Description
   - [ ] Here are some details about my PR, including screenshots (if 
applicable):
   
   `GaaSJobObservabilityEvent`s for Gobblin-on-Temporal jobs presently have no 
values set for the fields `jobPlanningStartTimestamp` and 
`jobPlanningEndTimestamp`.  This is because, in contrast with Gobblin-on-MR, 
`GenerateWorkUnitsImpl` emits no 
`TimingEvent.LauncherTimings.WORK_UNITS_CREATION` GTE to record such values.
   
   The historical impediment was that Temporal does not permit Activities to 
launch other Activities (only Workflows may do that).  `TemporalEventTimer`, 
however, emitted its GTE on a separate Activity for reliability.  This change 
reworks `TemporalEventTimer.Factory` to be useable within an `Activity`, by 
refactoring into complementary concrete realizations - 
`TemporalEventTimer.WithinWorkflowFactory` and 
`TemporalEventTimer.WithinActivityFactory`.  The latter sets up direct GTE 
emission within the same (current) `Activity`.
   
   The alternative of reworking `GenerateWorkUnitsImpl` from an `Activity` into 
a `Workflow` would be challenging to accomplish, because "Work Discovery" may 
be quite long running \- 15-30 mins is not uncommon for large full-initial-copy 
Iceberg-Distcp jobs \- yet Temporal [allows a workflow-task to run for a 
maximum of 2 
minutes](https://docs.temporal.io/encyclopedia/detecting-workflow-failures#workflow-task-timeout).
  (Since we wish to emit this timing GTE *in the midst of* Work Discovery, it 
would not be easy to have one sub-activity first performing all of Work 
Discovery and another subsequently doing GTE emission once the first concludes.)
   
   In addition, now that this refactoring happily unblocks GTE-emission from 
`GenerateWorkUnitsImpl` we were primed to emit a new GTE, which is actually 
even more critical to emit *in the midst of* Work Discovery: one to record the 
volume of `WorkUnit`s discovered.   To be useful, this must be done prior to WU 
serialization, which is memory-intensive and may OOM.  If that should happen 
this GTE's durable measurement would inform re-configuration for any subsequent 
re-attempt.
   
   Accordingly, we preserve total size and counts, means, and medians of both 
original and bin packed WUs.  This was inspired by a pair of log messages 
produced within `CopySource::getWorkUnits`:
   ```
   Statistics for ConcurrentBoundedPriorityIterable: {ResourcePool: {softBound: 
[ ... ], hardBound: [ ...]},totalResourcesUsed: [ ... ], 
maxRequirementPerDimension: [entities: 231943.0, bytesCopied: 
1.22419622769628E14], ... }
   ```
   and
   ```
   org.apache.gobblin.data.management.copy.CopySource - Bin packed work units. 
Initial work units: 27252, packed work units: 13175, max weight per bin: 
500000000, max work units per bin: 100.
   ```
   
   We often seek out these messages to diagnose and resolve failures during 
during `WorkUnit` Discovery, but rather than merely logging, this durable 
emission sets up machine-to-machine auto-right-sizing to orchestrate a 
potential re-attempt (upon WU serialization OOM).
   
   ### Tests
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   manual execution
   
   ### Commits
   - [ ] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
       1. Subject is separated from body by a blank line
       2. Subject is limited to 50 characters
       3. Subject does not end with a period
       4. Subject uses the imperative mood ("add", not "adding")
       5. Body wraps at 72 characters
       6. Body explains "what" and "why", not "how"
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@gobblin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to