kennknowles opened a new issue, #18937:
URL: https://github.com/apache/beam/issues/18937

   Jira tracking work around CPU profiling the Go SDK.
   
   Prior to this, a hook that enables the Go CPU and trace profiling libraries 
was added in the following PR
   
https://github.com/apache/beam/commit/adb78f6c3055693a053a89bdbaa46ca86685a290
   
   At present, it's broken on distributed runners.
   
https://github.com/apache/beam/blob/410ad7699621e28433d81809f6b9c42fe7bd6a60/sdks/go/pkg/beam/x/hooks/perf/perf.go#L50
   See also: 
https://stackoverflow.com/questions/67076744/cpu-profiling-not-covering-all-the-vcpu-time-of-apache-beam-pipeline-on-dataflow/67082075?noredirect=1#comment118629835_67082075
   
   The original intent was to have each bundle profiled individually, but this 
is at odds with how CPU profiling works with Go, which measures the whole 
process.
   
   At this point, different bundles start and stop each others profiling 
leading to a severe undercounting, which is not ideal. A better approach would 
be to start the profiling on Init, and do the sampling periodically.  So that 
we can get ~30 second chunks or similar, writing to new files each time, per 
worker. This at least avoid losing most of the profiling information at the end 
of a worker life. (profiles can be "merged" after the fact, so if something is 
stopped and started again right away, little is lost).
   
   Optionally, we should add a Teardown trigger to the hooks so we can do a 
clean exit in this case, but it's not a hard requirement for a first pass.
   
   Optionally, figure out a clean way to get a job to work with Google Cloud 
Profiler, likely as a different hook. 
   https://cloud.google.com/profiler/docs/profiling-go
   
   Imported from Jira 
[BEAM-4224](https://issues.apache.org/jira/browse/BEAM-4224). Original Jira may 
contain additional context.
   Reported by: lostluck.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to