Just saw this, one year later. So I observed this about 4 years ago in a different job to the one I have today - so my recollection might be imperfect. I was working with HP dual socket hosts with Haswell and with 144G to 384G running RHEL 6.6 or 6.7. At the time i was experimenting with scripts like like https://raw.githubusercontent.com/pixelb/ps_mem/master/ps_mem.py which try to come up with better estimates of "real" memory usage per process. The hosts had Azul Systems zing installed, which meant a kernel module that uses a different kind of VMA than the malloc/glibc VMA for large heap JVM processes. I saw that I could crash hosts almost at will by traversing /proc/pid/smaps for processes that were using between 5G and 40G.
At the time I found RedHat problem reports describing similar issues, which related the issues to locking in the procfs implementation. This report might even refer to the issue I saw: https://access.redhat.com/solutions/441543 On Friday, March 22, 2019 at 3:41:33 AM UTC-4, Abhinav wrote: > > Hey Peter, > By any chance, do you have source quoting #2 or configuration that can > help reproduce the host crash? > > On Fri, Mar 22, 2019, 11:08 Peter Booth <[email protected] <javascript:>> > wrote: > >> A couple of comments: >> >> 1. Brendan Gregg's homepage, and his last book are worth reading >> http://www.brendangregg.com/ >> 2. Not everything in the /proc pseudofilesystem can be read at no cost. >> For processes with large, complex memory footprints, reading >> /proc/<pid>/smaps can be expensive and can, in some cases, can crash hosts. >> >> Peter >> >> On Thursday, March 21, 2019 at 7:48:41 AM UTC-4, Mani Sarkar wrote: >>> >>> As a thank you, I have added your suggestions to this page >>> >>> https://github.com/neomatrix369/awesome-ai-ml-dl/blob/master/cloud-devops-infra/README.md#cpu >>> >>> And also cited the mailing list. >>> >>> Please do share and feel free to create PRs to it. >>> >>> On Sat, 1 Dec 2018 at 12:03 Mani Sarkar <[email protected]> wrote: >>> >>>> Thanks everyone who responded, they are good starting points. These >>>> look like will work on Linux or macos machibes what about Windows? >>>> >>>> Do you having any pointers for GPUs? >>>> >>>> >>>> On Thu, 29 Nov 2018 22:19 Mani Sarkar, <[email protected]> wrote: >>>> >>>>> Hi, >>>>> >>>>> Haven't written here for a long time. I have a query about probing >>>>> CPUs (via commands and/or programs) to find out vital information about >>>>> them mostly speed/performance related. >>>>> >>>>> I have put together this cheat-sheet for doing something like that for >>>>> GPUs - >>>>> https://gist.github.com/neomatrix369/256913dcf77cdbb5855dd2d7f5d81b84, >>>>> and would like to do something similar for CPUs as well, covering all >>>>> three >>>>> OSes. >>>>> >>>>> I know the GPU list is missing a good deal for the MacOS and Windows. >>>>> Some of you might say Macs do have GPUs, many do have. >>>>> >>>>> Any thoughts? >>>>> >>>>> Cheers >>>>> Mani >>>>> -- >>>>> >>>>> @theNeomatrix369 <http://twitter.com/theNeomatrix369>* | **Blog >>>>> <http://neomatrix369.wordpress.com/>** | *@adoptopenjdk >>>>> <http://twitter.com/adoptopenjdk> @graalvm >>>>> <http://twitter.com/graalvm> @graal <http://twitter.com/graal> >>>>> @truffleruby <http://twitter.com/truffleruby> | Dev. communities | >>>>> *Bitbucket >>>>> <https://bitbucket.org/neomatrix369>* * | **Github >>>>> <https://github.com/neomatrix369>* * | * *Slideshare >>>>> <https://slideshare.net/neomatrix369>* * | **LinkedIn >>>>> <http://uk.linkedin.com/pub/mani-sarkar/71/a77/39b>* >>>>> *Come to Devoxx UK 2019:* http://www.devoxx.co.uk/ >>>>> >>>>> *Don't chase success, rather aim for "Excellence", and success will >>>>> come chasing after you!* >>>>> >>>> -- >>>> >>>> @theNeomatrix369 <http://twitter.com/theNeomatrix369>* | **Blog >>>> <http://neomatrix369.wordpress.com/>** | *@adoptopenjdk >>>> <http://twitter.com/adoptopenjdk> @graalvm <http://twitter.com/graalvm> >>>> @graal <http://twitter.com/graal> @truffleruby >>>> <http://twitter.com/truffleruby> | Dev. communities | *Bitbucket >>>> <https://bitbucket.org/neomatrix369>* * | **Github >>>> <https://github.com/neomatrix369>* * | * *Slideshare >>>> <https://slideshare.net/neomatrix369>* * | **LinkedIn >>>> <http://uk.linkedin.com/pub/mani-sarkar/71/a77/39b>* >>>> *Come to Devoxx UK 2019:* http://www.devoxx.co.uk/ >>>> >>>> *Don't chase success, rather aim for "Excellence", and success will >>>> come chasing after you!* >>>> >>> -- >>> >>> @theNeomatrix369 <http://twitter.com/theNeomatrix369>* | **Blog >>> <http://neomatrix369.wordpress.com/>** | *@adoptopenjdk >>> <http://twitter.com/adoptopenjdk> @graalvm <http://twitter.com/graalvm> >>> @graal <http://twitter.com/graal> @truffleruby >>> <http://twitter.com/truffleruby> | Dev. communities | *Bitbucket >>> <https://bitbucket.org/neomatrix369>* * | **Github >>> <https://github.com/neomatrix369>* * | * *Slideshare >>> <https://slideshare.net/neomatrix369>* * | **LinkedIn >>> <http://uk.linkedin.com/pub/mani-sarkar/71/a77/39b>* >>> *Come to Devoxx UK 2019:* http://www.devoxx.co.uk/ >>> >>> *Don't chase success, rather aim for "Excellence", and success will come >>> chasing after you!* >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "mechanical-sympathy" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web, visit https://groups.google.com/d/msgid/mechanical-sympathy/6b375211-fd58-46b7-836a-6043cb1b4e5c%40googlegroups.com.
