That makes sense Martin, I will increase the cycles. In fact I just got
another false positive now for a different reason, the ZFS file data for
which the same statement stands true: Memory used by ZFS will be given back
to the System should another program needs it.

___________ Thu Oct 23 22:16:46 EDT 2014 ___________

Page Summary                Pages                MB  %Tot

------------     ----------------  ----------------  ----

Kernel                     572844              2237    7%

ZFS File Data             6497060             25379   77%

Anon                       275913              1077    3%

Exec and libs               49635               193    1%

Page cache                 249593               974    3%

Free (cachelist)           612330              2391    7%

Free (freelist)            131104               512    2%

Total                     8388479             32767


Thanks again and best regards,

- Nestor

On Thu, Oct 23, 2014 at 2:41 PM, Martin Pala <[email protected]> wrote:

> Thanks for data.
>
> We use kstat to get freemem statistics ... it contains both freelist and
> cachelist. In your case the output of mdb ::memstat shows, that the memory
> usage was ca. 85% (7+6+10+1+61 = 85), which matches the monit test limit (>
> 80%). The memory usage is real, check the "sr" (page scanner activity) in
> vmstat to see if it's problem for the system.
>
> If the high memory usage is normal, you can adjust the test limit to
> suppress the alerts, you can also use the "for X cycles" option to alert
> only if the memory usage remains high for long time, for example:
>
>         if memory usage > 90% for 20 cycles then alert
>
> Regards,
> Martin
>
>
> > On 23 Oct 2014, at 15:36, Nestor Urquiza <[email protected]>
> wrote:
> >
> > Thanks a lot for this Martin.
> >
> > Here is what I got (using version 5.7)
> > ___________ Thu Oct 23 01:15:10 EDT 2014 ___________
> >
> > Page Summary                Pages                MB  %Tot
> >
> > ------------     ----------------  ----------------  ----
> >
> > Kernel                     588196              2297    7%
> >
> > ZFS File Data              480826              1878    6%
> >
> > Anon                       820560              3205   10%
> >
> > Exec and libs               49531               193    1%
> >
> > Page cache                5125284             20020   61%
> >
> > Free (cachelist)          1009189              3942   12%
> >
> > Free (freelist)            314893              1230
> >
> >
> >
> > The main culprit is a process used by a vendor product:
> >
> >    PID USERNAME  SIZE   RSS STATE   PRI NICE      TIME  CPU PROCESS/NLWP
> >
> >
> >  10592 geneva     37G   20G cpu5     20    0   0:10:32  11% newaga/1
> >
> >
> >
> > This machine has 32GB RAM so at first glance someone would say we either
> increase memory or ask the vendor to provide some guidance on how to limit
> memory usage by that process.
> >
> > However I am wondering if "page cache" should really be alarming?
> According to Oracle
> https://blogs.oracle.com/rmc/entry/the_vm_system_formally_known "The
> cachelist operates as part of the freelist. When the freelist is depleted,
> allocations are made from the oldest pages in the cachelist. This allows
> the file system page cache to grow to consume all available memory and to
> dynamically shrink as memory is required for other purposes."
> >
> > In this case the newaga command is part of a replication script which
> brings an in memory database from a remote server locally. This in memory
> database works with memory segments that are replicated in disk and loaded
> as needed. This system can even work with 16GB RAM. We increased it because
> we were getting too many alerts from monit. In Solaris 10 (with the
> previous version of the same software) we used to have no memory alerts
> from monit using 16GB RAM, same database, or kind of because of course we
> changed both the OS and the version of the app.
> >
> > Bottom line I am now trying to understand if monit should be reporting
> memory usage in a different way for Solaris 11 or the vendor should be
> using memory in a different way or Solaris should be tweaked to please
> alerts.
> >
> >
> >
> > Under normal operation BTW this is what we get:
> >
> > > ::memstat
> >
> > Page Summary                Pages                MB  %Tot
> >
> > ------------     ----------------  ----------------  ----
> >
> > Kernel                     585743              2288    7%
> >
> > ZFS File Data              861077              3363   10%
> >
> > Anon                       793486              3099    9%
> >
> > Exec and libs               45752               178    1%
> >
> > Page cache                 259302              1012    3%
> >
> > Free (cachelist)          4301112             16801   51%
> >
> > Free (freelist)           1542007              6023   18%
> >
> >
> > Total                     8388479             32767
> >
> >
> >
> > Thanks again for your help with this!
> >
> > - Nestor
> >
> >
> > On Thu, Oct 23, 2014 at 5:24 AM, Martin Pala <[email protected]>
> wrote:
> > You can use the prstat exec action too, just remove the "-s rss" option
> to let it sort the output by CPU usage (default)
> >
> > Regards,
> > Martin
> >
> >
> >> On 22 Oct 2014, at 18:58, Nestor Urquiza <[email protected]>
> wrote:
> >>
> >> Thanks for this Martin,
> >>
> >> I will keep you posted now that I installed 5.7 and put the command in
> monitrc as recommended.
> >>
> >> We are also getting some alerts for CPU usage spikes. Do you have a
> recommendation for the command to run when getting those as well?
> >>
> >> Thanks!
> >> - Nestor
> >>
> >> On Wed, Oct 22, 2014 at 3:33 AM, Martin Pala <[email protected]>
> wrote:
> >> Hi Nestor,
> >>
> >> you can use something like this to get the distribution (will record
> the memstat output + user space distribution ... processes by RSS):
> >>
> >>         if memory usage > 80% then exec "/bin/sh -c 'exec >>
> /tmp/memstat.$$; echo ___________ `date` ___________; echo ::memstat | sudo
> mdb -k; prstat -c -s rss 1 10'"
> >>
> >>
> >> There was fix for memory usage report for Solaris in Monit 5.7 ...
> please can you upgrade to Monit 5.9? If the problem will persist - is the
> system where Monit is running 32-bit or 64-bit? Is it the Solaris zone?
> >>
> >>
> >> Regards,
> >> Martin
> >>
> >>
> >> > On 20 Oct 2014, at 22:04, Nestor Urquiza <[email protected]>
> wrote:
> >> >
> >> > Hi Martin,
> >> >
> >> > Is there a way to put monit in debug mode so we get more information
> about the memory distribution at the moment of the alert?
> >> >
> >> > One thing we have noticed is that regardless how many cycles we wait
> to alert, the succeed message comes in the next cycle after the alert which
> is really weird.
> >> >
> >> > Thanks,
> >> >
> >> > - Nestor
> >> >
> >> > On Sun, Oct 19, 2014 at 12:32 PM, Nestor Urquiza <
> [email protected]> wrote:
> >> > I am sorry about the examples but yes we do get memory utilization
> spikes:
> >> >
> >> > "mem usage of 82.6% matches resource limit [mem usage>80.0%],"
> >> >
> >> > It is difficult to get that information at the time of the alert
> though. Is there a way to put monit on debug mode or something to get
> exactly the memory utilization distribution?
> >> >
> >> > Right now everything is alright:
> >> >
> >> > $ sudo monit status
> >> >
> >> > ...
> >> >
> >> > System 'server'
> >> >
> >> >   status                            Running
> >> >
> >> >   monitoring status                 Monitored
> >> >
> >> >   load average                      [0.13] [0.12] [0.11]
> >> >
> >> >   cpu                               0.3%us 1.4%sy 0.0%wa
> >> >
> >> >   memory usage                      11822268 kB [35.2%]
> >> >
> >> >   swap usage                        0 kB [0.0%]
> >> >
> >> >   data collected                    Sun, 19 Oct 2014 12:23:47
> >> >
> >> > ...
> >> >
> >> >
> >> >
> >> > $ echo ::memstat | sudo mdb -k
> >> >
> >> > Page Summary                Pages                MB  %Tot
> >> >
> >> > ------------     ----------------  ----------------  ----
> >> >
> >> > Kernel                     591587              2310    7%
> >> >
> >> > ZFS File Data             1089502              4255   13%
> >> >
> >> > Anon                       999345              3903   12%
> >> >
> >> > Exec and libs               50239               196    1%
> >> >
> >> > Page cache                 249081               972    3%
> >> >
> >> > Free (cachelist)          3821104             14926   46%
> >> >
> >> > Free (freelist)           1587621              6201   19%
> >> >
> >> >
> >> > Total                     8388479             32767
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > Thanks,
> >> >
> >> > - Nestor
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Sat, Oct 18, 2014 at 4:22 PM, Martin Pala <[email protected]>
> wrote:
> >> > Hi,
> >> >
> >> > the attached error message ("cpu system usage ...") is for CPU test
> ... not related to memory usage. High "cpu system" usage may be for example
> sign of heavy disk I/O activity and/or swapping (memory shortage) - check
> vmstat output for details.
> >> >
> >> > If the memory usage report is problem, please can you provide output
> of "echo ::memstat | mdb -k" and "monit status" (just the System service
> part is sufficient).
> >> >
> >> >
> >> > Regards,
> >> > Martin
> >> >
> >> >
> >> >
> >> > > On 16 Oct 2014, at 16:41, Nestor Urquiza <[email protected]>
> wrote:
> >> > >
> >> > > Hi guys,
> >> > >
> >> > > Since we went from Solaris 10 to 11 we have seen an increase monit
> alerts related to memory resource utilization. We used to get no alerts
> even when we set the memorty threshold really low, for example:
> >> > >
> >> > > "...cpu system usage of 45.8% matches resource limit [cpu system
> usage>40.0%]"
> >> > >
> >> > >
> >> > > We have incremented the threshold to 90% but still we get alerts.
> >> > >
> >> > > Could it be that the way monit decides what is free memory in
> Solaris is incorrect when using ZFS
> http://serverfault.com/questions/378392/how-should-i-monitor-memory-usage-performance-in-sunos-solaris
> >> > >
> >> > > We are running monit version 5.5 BTW which has been working fine
> for ages.
> >> > >
> >> > > Perhaps version 5.9 has done something in that regard as I read the
> release notes ( http://mmonit.com/monit/changes/ ) are allowing to
> monitor generic device strings (not related really but worth to ask).
> >> > >
> >> > > Thanks!
> >> > >
> >> > > - Nestor
> >> > >
> >> > > --
> >> > > To unsubscribe:
> >> > > https://lists.nongnu.org/mailman/listinfo/monit-general
> >> >
> >> >
> >> > --
> >> > To unsubscribe:
> >> > https://lists.nongnu.org/mailman/listinfo/monit-general
> >> >
> >> >
> >> > --
> >> > To unsubscribe:
> >> > https://lists.nongnu.org/mailman/listinfo/monit-general
> >>
> >>
> >> --
> >> To unsubscribe:
> >> https://lists.nongnu.org/mailman/listinfo/monit-general
> >>
> >> --
> >> To unsubscribe:
> >> https://lists.nongnu.org/mailman/listinfo/monit-general
> >
> >
> > --
> > To unsubscribe:
> > https://lists.nongnu.org/mailman/listinfo/monit-general
> >
> > --
> > To unsubscribe:
> > https://lists.nongnu.org/mailman/listinfo/monit-general
>
>
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general
>
--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

Reply via email to