damccorm opened a new issue, #20298:
URL: https://github.com/apache/beam/issues/20298
We have a Profiler[1] that is integrated with SDK worker[1a], however it
only saves CPU metrics [1b].
We have a MemoryReporter util[2] which can log heap dumps, however it is not
documented on Beam Website and does not respect the \--profile_memory and
\--profile_location options[3]. The profile_memory flag currently works only
for Dataflow Runner users who run non-portable batch pipelines; profiles are
saved only if memory usage between samples exceeds 1000M.
We should improve memory profiling experience for Portable Python users and
consider making a guide on how users can investigate OOMing pipelines on Beam
website.
[1]
https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/utils/profiler.py#L46
[1a]
https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/runners/worker/sdk_worker_main.py#L157
[1b]
https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/utils/profiler.py#L112
[2]
https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/utils/profiler.py#L124
[3]
https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/options/pipeline_options.py#L846
Imported from Jira
[BEAM-10200](https://issues.apache.org/jira/browse/BEAM-10200). Original Jira
may contain additional context.
Reported by: tvalentyn.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]