That makes sense Martin, I will increase the cycles. In fact I just got another false positive now for a different reason, the ZFS file data for which the same statement stands true: Memory used by ZFS will be given back to the System should another program needs it.
___________ Thu Oct 23 22:16:46 EDT 2014 ___________ Page Summary Pages MB %Tot ------------ ---------------- ---------------- ---- Kernel 572844 2237 7% ZFS File Data 6497060 25379 77% Anon 275913 1077 3% Exec and libs 49635 193 1% Page cache 249593 974 3% Free (cachelist) 612330 2391 7% Free (freelist) 131104 512 2% Total 8388479 32767 Thanks again and best regards, - Nestor On Thu, Oct 23, 2014 at 2:41 PM, Martin Pala <[email protected]> wrote: > Thanks for data. > > We use kstat to get freemem statistics ... it contains both freelist and > cachelist. In your case the output of mdb ::memstat shows, that the memory > usage was ca. 85% (7+6+10+1+61 = 85), which matches the monit test limit (> > 80%). The memory usage is real, check the "sr" (page scanner activity) in > vmstat to see if it's problem for the system. > > If the high memory usage is normal, you can adjust the test limit to > suppress the alerts, you can also use the "for X cycles" option to alert > only if the memory usage remains high for long time, for example: > > if memory usage > 90% for 20 cycles then alert > > Regards, > Martin > > > > On 23 Oct 2014, at 15:36, Nestor Urquiza <[email protected]> > wrote: > > > > Thanks a lot for this Martin. > > > > Here is what I got (using version 5.7) > > ___________ Thu Oct 23 01:15:10 EDT 2014 ___________ > > > > Page Summary Pages MB %Tot > > > > ------------ ---------------- ---------------- ---- > > > > Kernel 588196 2297 7% > > > > ZFS File Data 480826 1878 6% > > > > Anon 820560 3205 10% > > > > Exec and libs 49531 193 1% > > > > Page cache 5125284 20020 61% > > > > Free (cachelist) 1009189 3942 12% > > > > Free (freelist) 314893 1230 > > > > > > > > The main culprit is a process used by a vendor product: > > > > PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP > > > > > > 10592 geneva 37G 20G cpu5 20 0 0:10:32 11% newaga/1 > > > > > > > > This machine has 32GB RAM so at first glance someone would say we either > increase memory or ask the vendor to provide some guidance on how to limit > memory usage by that process. > > > > However I am wondering if "page cache" should really be alarming? > According to Oracle > https://blogs.oracle.com/rmc/entry/the_vm_system_formally_known "The > cachelist operates as part of the freelist. When the freelist is depleted, > allocations are made from the oldest pages in the cachelist. This allows > the file system page cache to grow to consume all available memory and to > dynamically shrink as memory is required for other purposes." > > > > In this case the newaga command is part of a replication script which > brings an in memory database from a remote server locally. This in memory > database works with memory segments that are replicated in disk and loaded > as needed. This system can even work with 16GB RAM. We increased it because > we were getting too many alerts from monit. In Solaris 10 (with the > previous version of the same software) we used to have no memory alerts > from monit using 16GB RAM, same database, or kind of because of course we > changed both the OS and the version of the app. > > > > Bottom line I am now trying to understand if monit should be reporting > memory usage in a different way for Solaris 11 or the vendor should be > using memory in a different way or Solaris should be tweaked to please > alerts. > > > > > > > > Under normal operation BTW this is what we get: > > > > > ::memstat > > > > Page Summary Pages MB %Tot > > > > ------------ ---------------- ---------------- ---- > > > > Kernel 585743 2288 7% > > > > ZFS File Data 861077 3363 10% > > > > Anon 793486 3099 9% > > > > Exec and libs 45752 178 1% > > > > Page cache 259302 1012 3% > > > > Free (cachelist) 4301112 16801 51% > > > > Free (freelist) 1542007 6023 18% > > > > > > Total 8388479 32767 > > > > > > > > Thanks again for your help with this! > > > > - Nestor > > > > > > On Thu, Oct 23, 2014 at 5:24 AM, Martin Pala <[email protected]> > wrote: > > You can use the prstat exec action too, just remove the "-s rss" option > to let it sort the output by CPU usage (default) > > > > Regards, > > Martin > > > > > >> On 22 Oct 2014, at 18:58, Nestor Urquiza <[email protected]> > wrote: > >> > >> Thanks for this Martin, > >> > >> I will keep you posted now that I installed 5.7 and put the command in > monitrc as recommended. > >> > >> We are also getting some alerts for CPU usage spikes. Do you have a > recommendation for the command to run when getting those as well? > >> > >> Thanks! > >> - Nestor > >> > >> On Wed, Oct 22, 2014 at 3:33 AM, Martin Pala <[email protected]> > wrote: > >> Hi Nestor, > >> > >> you can use something like this to get the distribution (will record > the memstat output + user space distribution ... processes by RSS): > >> > >> if memory usage > 80% then exec "/bin/sh -c 'exec >> > /tmp/memstat.$$; echo ___________ `date` ___________; echo ::memstat | sudo > mdb -k; prstat -c -s rss 1 10'" > >> > >> > >> There was fix for memory usage report for Solaris in Monit 5.7 ... > please can you upgrade to Monit 5.9? If the problem will persist - is the > system where Monit is running 32-bit or 64-bit? Is it the Solaris zone? > >> > >> > >> Regards, > >> Martin > >> > >> > >> > On 20 Oct 2014, at 22:04, Nestor Urquiza <[email protected]> > wrote: > >> > > >> > Hi Martin, > >> > > >> > Is there a way to put monit in debug mode so we get more information > about the memory distribution at the moment of the alert? > >> > > >> > One thing we have noticed is that regardless how many cycles we wait > to alert, the succeed message comes in the next cycle after the alert which > is really weird. > >> > > >> > Thanks, > >> > > >> > - Nestor > >> > > >> > On Sun, Oct 19, 2014 at 12:32 PM, Nestor Urquiza < > [email protected]> wrote: > >> > I am sorry about the examples but yes we do get memory utilization > spikes: > >> > > >> > "mem usage of 82.6% matches resource limit [mem usage>80.0%]," > >> > > >> > It is difficult to get that information at the time of the alert > though. Is there a way to put monit on debug mode or something to get > exactly the memory utilization distribution? > >> > > >> > Right now everything is alright: > >> > > >> > $ sudo monit status > >> > > >> > ... > >> > > >> > System 'server' > >> > > >> > status Running > >> > > >> > monitoring status Monitored > >> > > >> > load average [0.13] [0.12] [0.11] > >> > > >> > cpu 0.3%us 1.4%sy 0.0%wa > >> > > >> > memory usage 11822268 kB [35.2%] > >> > > >> > swap usage 0 kB [0.0%] > >> > > >> > data collected Sun, 19 Oct 2014 12:23:47 > >> > > >> > ... > >> > > >> > > >> > > >> > $ echo ::memstat | sudo mdb -k > >> > > >> > Page Summary Pages MB %Tot > >> > > >> > ------------ ---------------- ---------------- ---- > >> > > >> > Kernel 591587 2310 7% > >> > > >> > ZFS File Data 1089502 4255 13% > >> > > >> > Anon 999345 3903 12% > >> > > >> > Exec and libs 50239 196 1% > >> > > >> > Page cache 249081 972 3% > >> > > >> > Free (cachelist) 3821104 14926 46% > >> > > >> > Free (freelist) 1587621 6201 19% > >> > > >> > > >> > Total 8388479 32767 > >> > > >> > > >> > > >> > > >> > > >> > Thanks, > >> > > >> > - Nestor > >> > > >> > > >> > > >> > > >> > > >> > > >> > On Sat, Oct 18, 2014 at 4:22 PM, Martin Pala <[email protected]> > wrote: > >> > Hi, > >> > > >> > the attached error message ("cpu system usage ...") is for CPU test > ... not related to memory usage. High "cpu system" usage may be for example > sign of heavy disk I/O activity and/or swapping (memory shortage) - check > vmstat output for details. > >> > > >> > If the memory usage report is problem, please can you provide output > of "echo ::memstat | mdb -k" and "monit status" (just the System service > part is sufficient). > >> > > >> > > >> > Regards, > >> > Martin > >> > > >> > > >> > > >> > > On 16 Oct 2014, at 16:41, Nestor Urquiza <[email protected]> > wrote: > >> > > > >> > > Hi guys, > >> > > > >> > > Since we went from Solaris 10 to 11 we have seen an increase monit > alerts related to memory resource utilization. We used to get no alerts > even when we set the memorty threshold really low, for example: > >> > > > >> > > "...cpu system usage of 45.8% matches resource limit [cpu system > usage>40.0%]" > >> > > > >> > > > >> > > We have incremented the threshold to 90% but still we get alerts. > >> > > > >> > > Could it be that the way monit decides what is free memory in > Solaris is incorrect when using ZFS > http://serverfault.com/questions/378392/how-should-i-monitor-memory-usage-performance-in-sunos-solaris > >> > > > >> > > We are running monit version 5.5 BTW which has been working fine > for ages. > >> > > > >> > > Perhaps version 5.9 has done something in that regard as I read the > release notes ( http://mmonit.com/monit/changes/ ) are allowing to > monitor generic device strings (not related really but worth to ask). > >> > > > >> > > Thanks! > >> > > > >> > > - Nestor > >> > > > >> > > -- > >> > > To unsubscribe: > >> > > https://lists.nongnu.org/mailman/listinfo/monit-general > >> > > >> > > >> > -- > >> > To unsubscribe: > >> > https://lists.nongnu.org/mailman/listinfo/monit-general > >> > > >> > > >> > -- > >> > To unsubscribe: > >> > https://lists.nongnu.org/mailman/listinfo/monit-general > >> > >> > >> -- > >> To unsubscribe: > >> https://lists.nongnu.org/mailman/listinfo/monit-general > >> > >> -- > >> To unsubscribe: > >> https://lists.nongnu.org/mailman/listinfo/monit-general > > > > > > -- > > To unsubscribe: > > https://lists.nongnu.org/mailman/listinfo/monit-general > > > > -- > > To unsubscribe: > > https://lists.nongnu.org/mailman/listinfo/monit-general > > > -- > To unsubscribe: > https://lists.nongnu.org/mailman/listinfo/monit-general >
-- To unsubscribe: https://lists.nongnu.org/mailman/listinfo/monit-general
