On Thu, Jan 21, 2021 at 10:38:59PM +0000, Anindya Mukherjee wrote: > Hi, > > Just to follow up, I was playing with allocating memory from a test > program in various ways in order to produce a situation when SIZE is > less than RES. The following program causes this to happen. If I mmap a > large file, the SIZE remains tiny, indicating that the mapped region is > not counted as part of text + data + stack. Then when I go ahead and > touch all the memory, SIZE remains tiny but RES grows to the size of the > file. Very interesting.
So SIZE does not include mappings backed by a file system object, but RES does. RES only grows once the pages are touched, this is demand paging in action (anon pages act the same way). Nice. I already suspected would be something like that, but never took the time to find out by experimenting or code study. Now the next quesion is if SIZE *should* include non-anonymous pages. getrlimit(2) explicitly says RLIMIT_DATA (which is limiting SIZE) only includes anonymous data. So that hints SIZE indeed should not include those anon pages. To back this up: ps(1) lists several size related stats: Desc Keyw Function Value Data dsiz dsize p_vm_dsize Resident 1 rss p_rssize p_vm_rssize Resident 2 rsz rssize p_vm_rssize Stack ssiz ssize p_vm_ssize Text (code) tsiz tsize p_vm_tsize Virtual vsiz vsize p_vm_dsize + p_vm_ssize + _vm_tsize top(1) uses the equivalent of vsiz for SIZE and rss for RES. So this is consistent with your observations. I note that the rss vs rsz distinciton ps(1) mentions does not actually seems to be implemented in ps(1). BTW: the proper way to get the size is by opening the file and use fstat(2). -Otto > > Quick and very dirty code below: > > /* This demonstrates why the SIZE column in OpenBSD top can be less than > * the RES column. This is because mmapped areas of virtual memory are > * not counted as text, data, or size, but counted as part of the > * resident pages, when touched. The program maps a (preferably large) > * file and then waits for the user to examine the process memory > * statistics. > */ > > #include <fcntl.h> > #include <stdio.h> > #include <stdlib.h> > #include <sys/mman.h> > #include <unistd.h> > > int main(int argc, char **argv) > { > char ch; > char *pch; > void *result; > FILE *fp; > int fd; > int i; > size_t mapSize; > int current = 0; > const int increment = 10; > double percent; > double mapRatio; > > if (argc < 2) > { > printf("No file name supplied.\n"); > exit(1); > } > > printf("About to mmap. Press Enter... "); > getchar(); > fp = fopen(argv[1], "r"); > if (fp == NULL) > { > perror(NULL); > exit(1); > } > fd = fileno(fp); > > if (fseek(fp, 0, SEEK_END) == -1) > { > perror(NULL); > exit(1); > } > mapSize = ftell(fp); > if (mapSize == -1) > { > perror(NULL); > exit(1); > } > if (fseek(fp, 0, SEEK_SET) == -1) > { > perror(NULL); > exit(1); > } > result = mmap(NULL, mapSize, PROT_READ, MAP_PRIVATE, fd, 0); > if(close(fd) == -1) > { > perror(NULL); > exit(1); > } > if (result == MAP_FAILED) > { > perror(NULL); > exit(1); > } > printf("%zu bytes mmapped at %p. Press Enter... ", mapSize, result); > getchar(); > > pch = (char *)result; > printf("Touching mapped memory... "); > mapRatio = 100.0 / mapSize; > for (i = 0; i < mapSize; i++) > { > ch = pch[i]; > percent = (i + 1) * mapRatio; > if (current < percent) > { > while (current < percent) > current+= increment; > if (current > percent) > current -=increment; > if (current < 100) > { > printf("%d%%... ", current); > fflush(stdout); > current+= increment; > } > } > } > printf("100%%\nRead done. Press Enter... "); > getchar(); > if(munmap(result, mapSize) == -1) > { > perror(NULL); > exit(1); > } > return 0; > } > > Anindya > > > > From: Anindya Mukherjee <anindy...@hotmail.com> > Sent: January 17, 2021 6:35 PM > To: Otto Moerbeek <o...@drijf.net> > Cc: misc@openbsd.org <misc@openbsd.org> > Subject: Re: Understanding memory statistics > > Hi, > > I had a look at the code for top and some of the VM code. I think I have > a few more answers. > > The easiest one is the calculation for the tot number in top: > https://github.com/openbsd/src/blob/d098acee57f5a5eacb13200c49034ecb8cbd8c29/usr.bin/top/machine.c#L293 > Here we see that it is calculated as the total page count - free page > count. > > If we calculate delta = tot - active - inactive - cache - wired, we see > that it is still not zero. Typically it is a few hundred MB on my > system. This might be some "dynamic" memory being allocated by the > kernel? I don't know what that means :) > > I am not totally sure, but from looking at the code, I suspect that the > SIZE which ultimately comes from struct vmspace does not take into > account shared memory mappings, or at least not all of them. The text, > data, and stack sizes are added up here: > https://github.com/openbsd/src/blob/d098acee57f5a5eacb13200c49034ecb8cbd8c29/usr.bin/top/machine.c#L70 > > However, I think the RES parameter also takes into account shared memory > mappings. This can explain why it is often higher than SIZE, > particularly for large programs. > > Anindya > > > > From: Anindya Mukherjee <anindy...@hotmail.com> > Sent: January 12, 2021 3:22 PM > To: Otto Moerbeek <o...@drijf.net> > Cc: misc@openbsd.org <misc@openbsd.org> > Subject: Re: Understanding memory statistics > > Hi Otto, > > Thank you for your kind reply and explanations. They helped me > understand a few more things. I have some basic familiarity with the > concepts but not so much with OpenBSD internals, although I have been > using it. I need to research a bit more, but in my next reply I'll try > to answer my questions, with some examples. > > I love OpenBSD and can program in C, so I think given time I'll be able > to make some contributions to it. I have been working on tmux and it's > been a lot of fun. I got a lot of help and encouragement from Nicholas > Mariott. > > Best, > Anindya > > From: Otto Moerbeek <o...@drijf.net> > Sent: January 10, 2021 11:42 PM > To: Anindya Mukherjee <anindy...@hotmail.com> > Cc: misc@openbsd.org <misc@openbsd.org> > Subject: Re: Understanding memory statistics > > On Sun, Jan 10, 2021 at 09:34:49PM +0000, Anindya Mukherjee wrote: > > > Hi, I'm trying to understand the various numbers reported for memory > > usage from top, vmstat, and systat. I'm running OpenBSD 6.8 on a Dell > > Optiplex 7040 with an i7 6700, 3.4 Ghz and 32 GB RAM. The GPU is an > > Intel HD Graphics 530, integrated. Everything is running smoothly. For > > my own edification, I have a few questions. I searched the mailing lists > > for similar questions in the past, and found some, but they did not > > fully satisfy my curiosity. > > > > dmesg reports: > > real mem = 34201006080 (32616MB) > > avail mem = 33149427712 (31613MB) > > I think the difference is due to the GPU reserving some memory. > > That might be, I think it at least includes mem used by the kernel > for its code and static data. > > > Q: Is there a way to view the total amount of video memory, the amount > > currently being used, and the GPU usage? > > AFAIK not. Some bioses have settings for the video mem used (if you > have it shared with main mem). > > > > > When I run top, it reports the following memory usage: > > Memory: Real: 1497M/4672M act/tot Free: 26G Cache: 2236M Swap: 0K/11G > > If I sum up the RES numbers for all the processes, it is close to the > > act number = 1497 M (this is mostly due to Firefox). I read that the > > cache number is included in tot, but even if I subtract cache and act > > from tot there is 939 MB left. > > Q: What is this 939 MB being used for, assuming the above makes sense? > > inactive pages? > > > Q: What is the cache number indicating exactly? > > memoy used for file systemn caching. > > > > > If I sum up tot + free * 1024 I get 31296 MB, which less than the 31613 > > MB of available memory reported by dmesg. I initially assumed that the > > difference might be kernel wired memory. However the uvm view of systat > > shows 7514 wired pages = approx 30 MB which is very small. > > Q: What is the remaining memory being used for? > > I think you are looking at dynamic allocations done by the kernel. > > > Q: What is in kernel wired memory? In particular, is the file system > > cache in kernel wired memory or in the cache number? > > Kernel wired means data pages allocated by the kernel that will not be > paged out. The file system mem will also not be paged out (when > evecting those they are discarded if not dirty or written to the file > if dirty) but the file system cache pages are not in the wired count. > > > In the man page for systat(1) the active memory is described as being > > used by processes active in the last 20 seconds (recently), while the > > total is for all processes. These are the same two numbers as act and > > tot in top, and act = avm as reported by vmstat. This confused me > > because adding up the RES sizes of all the processes I get nowhere near > > to tot (even after subtracting cache). > > Accounting of shared pages is hard and ambiguous. To illustrate: if > you switch on S in top, you'll see a bunch of kenel space processes al > at SIZE 0 and the same RES size. They do share the same (kernel) > memory. > > > > > There is another thing that confused me in the top output. At first I > > assumed that SIZE is the total virtual memory size of the process > > (allocated), while RES is the resident size. For example, this is so on > > Linux and hence in that case by definition SIZE should always be greater > > than RES. However here in many cases SIZE < RES. > > I am unsure how that is caused. It is possibly a shared pages thing. > > > > > I read in the man page for top that SIZE is actually text + data + stack > > for the process. However this did not clear up my confusion or > > misunderstanding. Perhaps something to do with shared memory not being > > counted? > > Q: How can SIZE be less that RES? An example showing how this could > > happen would be really helpful. > > I guess doing some experimenting and code analysis and share your findings. > > > > > Q: Finally, where can I find documentation about the classification for > > memory pages (active, inactive, wired, etc.)? I suspect some digging > > around in the source in order, but could use some pointers. > > The start would be man uvm_init. But the rest is code. > > > > > I hope these make sense and are not too pedantic. Looking forward to > > comments from the experts, thanks! > > > > Anindya Mukherjee > > > > -Otto