On Thu, Jan 21, 2021 at 10:38:59PM +0000, Anindya Mukherjee wrote:

> Hi,
> 
> Just to follow up, I was playing with allocating memory from a test
> program in various ways in order to produce a situation when SIZE is
> less than RES. The following program causes this to happen. If I mmap a
> large file, the SIZE remains tiny, indicating that the mapped region is
> not counted as part of text + data + stack. Then when I go ahead and
> touch all the memory, SIZE remains tiny but RES grows to the size of the
> file. Very interesting.

So SIZE does not include mappings backed by a file system object, but
RES does. RES only grows once the pages are touched, this is demand
paging in action (anon pages act the same way).

Nice. I already suspected would be something like that, but never took
the time to find out by experimenting or code study.

Now the next quesion is if SIZE *should* include non-anonymous
pages. getrlimit(2) explicitly says RLIMIT_DATA (which is limiting
SIZE) only includes anonymous data. So that hints SIZE indeed should not
include those anon pages. 

To back this up:

ps(1) lists several size related stats:

Desc            Keyw    Function        Value
Data            dsiz    dsize           p_vm_dsize
Resident 1      rss     p_rssize        p_vm_rssize
Resident 2      rsz     rssize          p_vm_rssize
Stack           ssiz    ssize           p_vm_ssize
Text (code)     tsiz    tsize           p_vm_tsize
Virtual         vsiz    vsize           p_vm_dsize + p_vm_ssize + _vm_tsize


top(1) uses the equivalent of vsiz for SIZE and rss for RES. So this
is consistent with your observations.

I note that the rss vs rsz distinciton ps(1) mentions does not
actually seems to be implemented in ps(1).

BTW: the proper way to get the size is by opening the file and use fstat(2).

        -Otto


> 
> Quick and very dirty code below:
> 
> /* This demonstrates why the SIZE column in OpenBSD top can be less than
>  * the RES column. This is because mmapped areas of virtual memory are
>  * not counted as text, data, or size, but counted as part of the
>  * resident pages, when touched. The program maps a (preferably large)
>  * file and then waits for the user to examine the process memory
>  * statistics.
>  */
> 
> #include <fcntl.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <sys/mman.h>
> #include <unistd.h>
> 
> int main(int argc, char **argv)
> {
>       char ch;
>       char *pch;
>       void *result;
>       FILE *fp;
>       int fd;
>       int i;
>       size_t mapSize;
>       int current = 0;
>       const int increment = 10;
>       double percent;
>       double mapRatio;
> 
>       if (argc < 2)
>       {
>               printf("No file name supplied.\n");
>               exit(1);
>       }
> 
>       printf("About to mmap. Press Enter... ");
>       getchar();
>       fp = fopen(argv[1], "r");
>       if (fp == NULL)
>       {
>               perror(NULL);
>               exit(1);
>       }
>       fd = fileno(fp);
> 
>       if (fseek(fp, 0, SEEK_END) == -1)
>       {
>               perror(NULL);
>               exit(1);
>       }
>       mapSize = ftell(fp);
>       if (mapSize == -1)
>       {
>               perror(NULL);
>               exit(1);
>       }
>       if (fseek(fp, 0, SEEK_SET) == -1)
>       {
>               perror(NULL);
>               exit(1);
>       }
>       result = mmap(NULL, mapSize, PROT_READ, MAP_PRIVATE, fd, 0);
>       if(close(fd) == -1)
>       {
>               perror(NULL);
>               exit(1);
>       }
>       if (result == MAP_FAILED)
>       {
>               perror(NULL);
>               exit(1);
>       }
>       printf("%zu bytes mmapped at %p. Press Enter... ", mapSize, result);
>       getchar();
> 
>       pch = (char *)result;
>       printf("Touching mapped memory... ");
>       mapRatio = 100.0 / mapSize;
>       for (i = 0; i < mapSize; i++)
>       {
>               ch = pch[i];
>               percent = (i + 1) * mapRatio;
>               if (current < percent)
>               {
>                       while (current < percent)
>                               current+= increment;
>                       if (current > percent)
>                               current -=increment;
>                       if (current < 100)
>                       {
>                               printf("%d%%... ", current);
>                               fflush(stdout);
>                               current+= increment;
>                       }
>               }
>       }
>       printf("100%%\nRead done. Press Enter... ");
>       getchar();
>       if(munmap(result, mapSize) == -1)
>       {
>               perror(NULL);
>               exit(1);
>       }
>       return 0;
> }
> 
> Anindya
> 
> 
> 
> From: Anindya Mukherjee <anindy...@hotmail.com>
> Sent: January 17, 2021 6:35 PM
> To: Otto Moerbeek <o...@drijf.net>
> Cc: misc@openbsd.org <misc@openbsd.org>
> Subject: Re: Understanding memory statistics 
>  
> Hi,
> 
> I had a look at the code for top and some of the VM code. I think I have
> a few more answers.
> 
> The easiest one is the calculation for the tot number in top:
> https://github.com/openbsd/src/blob/d098acee57f5a5eacb13200c49034ecb8cbd8c29/usr.bin/top/machine.c#L293
> Here we see that it is calculated as the total page count - free page
> count.
> 
> If we calculate delta = tot - active - inactive - cache - wired, we see
> that it is still not zero. Typically it is a few hundred MB on my
> system. This might be some "dynamic" memory being allocated by the
> kernel? I don't know what that means :)
> 
> I am not totally sure, but from looking at the code, I suspect that the
> SIZE which ultimately comes from struct vmspace does not take into
> account shared memory mappings, or at least not all of them. The text,
> data, and stack sizes are added up here:
> https://github.com/openbsd/src/blob/d098acee57f5a5eacb13200c49034ecb8cbd8c29/usr.bin/top/machine.c#L70
> 
> However, I think the RES parameter also takes into account shared memory
> mappings. This can explain why it is often higher than SIZE,
> particularly for large programs.
> 
> Anindya
> 
> 
> 
> From: Anindya Mukherjee <anindy...@hotmail.com>
> Sent: January 12, 2021 3:22 PM
> To: Otto Moerbeek <o...@drijf.net>
> Cc: misc@openbsd.org <misc@openbsd.org>
> Subject: Re: Understanding memory statistics 
>  
> Hi Otto,
> 
> Thank you for your kind reply and explanations. They helped me
> understand a few more things. I have some basic familiarity with the
> concepts but not so much with OpenBSD internals, although I have been
> using it. I need to research a bit more, but in my next reply I'll try
> to answer my questions, with some examples.
> 
> I love OpenBSD and can program in C, so I think given time I'll be able
> to make some contributions to it. I have been working on tmux and it's
> been a lot of fun. I got a lot of help and encouragement from Nicholas
> Mariott.
> 
> Best,
> Anindya
> 
> From: Otto Moerbeek <o...@drijf.net>
> Sent: January 10, 2021 11:42 PM
> To: Anindya Mukherjee <anindy...@hotmail.com>
> Cc: misc@openbsd.org <misc@openbsd.org>
> Subject: Re: Understanding memory statistics 
>  
> On Sun, Jan 10, 2021 at 09:34:49PM +0000, Anindya Mukherjee wrote:
> 
> > Hi, I'm trying to understand the various numbers reported for memory
> > usage from top, vmstat, and systat. I'm running OpenBSD 6.8 on a Dell
> > Optiplex 7040 with an i7 6700, 3.4 Ghz and 32 GB RAM. The GPU is an
> > Intel HD Graphics 530, integrated. Everything is running smoothly. For
> > my own edification, I have a few questions. I searched the mailing lists
> > for similar questions in the past, and found some, but they did not
> > fully satisfy my curiosity.
> > 
> > dmesg reports:
> > real mem = 34201006080 (32616MB)
> > avail mem = 33149427712 (31613MB)
> > I think the difference is due to the GPU reserving some memory.
> 
> That might be, I think it at least includes mem used by the kernel
> for its code and static data.
> 
> > Q: Is there a way to view the total amount of video memory, the amount
> > currently being used, and the GPU usage?
> 
> AFAIK not. Some bioses have settings for the video mem used (if you
> have it shared with main mem).
> 
> > 
> > When I run top, it reports the following memory usage:
> > Memory: Real: 1497M/4672M act/tot Free: 26G Cache: 2236M Swap: 0K/11G
> > If I sum up the RES numbers for all the processes, it is close to the
> > act number = 1497 M (this is mostly due to Firefox). I read that the
> > cache number is included in tot, but even if I subtract cache and act
> > from tot there is 939 MB left.
> > Q: What is this 939 MB being used for, assuming the above makes sense?
> 
> inactive pages?
> 
> > Q: What is the cache number indicating exactly?
> 
> memoy used for file systemn caching.
> 
> > 
> > If I sum up tot + free * 1024 I get 31296 MB, which less than the 31613
> > MB of available memory reported by dmesg. I initially assumed that the
> > difference might be kernel wired memory. However the uvm view of systat
> > shows 7514 wired pages = approx 30 MB which is very small.
> > Q: What is the remaining memory being used for?
> 
> I think you are looking at dynamic allocations done by the kernel.
> 
> > Q: What is in kernel wired memory? In particular, is the file system
> > cache in kernel wired memory or in the cache number?
> 
> Kernel wired means data pages allocated by the kernel that will not be
> paged out. The file system mem will also not be paged out (when
> evecting those they are discarded if not dirty or written to the file
> if dirty) but the file system cache pages are not in the wired count.
> 
> > In the man page for systat(1) the active memory is described as being
> > used by processes active in the last 20 seconds (recently), while the
> > total is for all processes. These are the same two numbers as act and
> > tot in top, and act = avm as reported by vmstat. This confused me
> > because adding up the RES sizes of all the processes I get nowhere near
> > to tot (even after subtracting cache).
> 
> Accounting of shared pages is hard and ambiguous. To illustrate: if
> you switch on S in top, you'll see a bunch of kenel space processes al
> at SIZE 0 and the same RES size. They do share the same (kernel)
> memory.
> 
> > 
> > There is another thing that confused me in the top output. At first I
> > assumed that SIZE is the total virtual memory size of the process
> > (allocated), while RES is the resident size. For example, this is so on
> > Linux and hence in that case by definition SIZE should always be greater
> > than RES. However here in many cases SIZE < RES.
> 
> I am unsure how that is caused. It is possibly a shared pages thing.
> 
> > 
> > I read in the man page for top that SIZE is actually text + data + stack
> > for the process. However this did not clear up my confusion or
> > misunderstanding. Perhaps something to do with shared memory not being
> > counted?
> > Q: How can SIZE be less that RES? An example showing how this could
> > happen would be really helpful.
> 
> I guess doing some experimenting and code analysis and share your findings.
> 
> > 
> > Q: Finally, where can I find documentation about the classification for
> > memory pages (active, inactive, wired, etc.)? I suspect some digging
> > around in the source in order, but could use some pointers.
> 
> The start would be man uvm_init. But the rest is code.
> 
> > 
> > I hope these make sense and are not too pedantic. Looking forward to
> > comments from the experts, thanks!
> > 
> > Anindya Mukherjee
> > 
> 
>         -Otto

Reply via email to