damccorm opened a new issue, #20298:
URL: https://github.com/apache/beam/issues/20298

   We have a Profiler[1] that is integrated with SDK worker[1a], however it 
only saves CPU metrics [1b].
   We have a MemoryReporter util[2] which can log heap dumps, however it is not 
documented on Beam Website and does not respect the \--profile_memory and 
\--profile_location options[3]. The profile_memory flag currently works only 
for  Dataflow Runner users who run non-portable batch pipelines;  profiles are 
saved only if memory usage between samples exceeds 1000M. 
   
   We should improve memory profiling experience for Portable Python users and 
consider making a guide on how users can investigate OOMing pipelines on Beam 
website.
    
   [1] 
https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/utils/profiler.py#L46
   [1a] 
https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/runners/worker/sdk_worker_main.py#L157
   [1b] 
https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/utils/profiler.py#L112
   [2] 
https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/utils/profiler.py#L124
   [3] 
https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/options/pipeline_options.py#L846
   
   Imported from Jira 
[BEAM-10200](https://issues.apache.org/jira/browse/BEAM-10200). Original Jira 
may contain additional context.
   Reported by: tvalentyn.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to