Hello, I have a couple FreeBSD boxes that are providing a captive portal wifi authentcation system. Without delving into the implementation details, I'm running dhcpd, squid, and apache. We have in-house perl CGI scripts that handle session and IP management, dynamically creating and destroying netgraph nodes (ng_nat), connecting them to ipfw (ng_ipfw), and altering the contents of access tables.
Right now, I'm seeing peaks of about 300 authenticated users; I'm expecting this to grow about 200% when everyone gets back from summer break. I'm trying to look at system load statistics to reassure myself we'll be fine in a month -- or to panic and start throwing more hardware at things. What is the difference between the SIZE and RES fields of top? Better yet, what does top(1) mean by "the total size of the process (text, data, and stack)" and "the current amount of resident memory"? How does this work with a threaded program like apache? If all the threads share the same text and most (all?) of the same data pages, what's the best way to figure out the fixed cost and the average per-thread cost? Some sample top output on this host: Mem: 131M Active, 3754M Inact, 425M Wired, 177M Cache, 214M Buf, 3422M Free Swap: 16G Total, 24K Used, 16G Free [...] PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 32361 root 1 96 0 106M 16604K select 2 0:02 0.00% httpd 50687 www 1 20 0 106M 17196K lockf 0 0:01 0.00% httpd I'm having a hard time accounting for the 3.8GB of inactive memory (which as I understand, represents physical pages that are in-use but not recently used, prime candidates for being swapped out if the free page count gets low). Maybe better understanding the RES verses SIZE data along with their relation to threads will explain what's going on here. One of my concerns is that a large chunk of memory is going to belong to the kernel in my configuration. I found vmstat -m (selected output lines follow): | libalias 5629 3251K - 19760019 128 | ifnet 13 25K - 13 256,2048 | dummynet 22 8K - 26 256,512,1024 | netgraph_msg 0 0K - 101991 64,128,256,512,1024,4096 | netgraph_node 72 18K - 56133 256 | netgraph_hook 284 36K - 30204 128 | netgraph 283 16K - 30203 16,64,128 | netgraph_parse 0 0K - 22650 16 | netgraph_sock 0 0K - 48581 128 | netgraph_path 0 0K - 71508 16,32 Does this really mean that my netgraph nodes (and their libalias instances) are really eating up less than 4MB of memory on the system? The only other "big spender" appears to be devbuf at 35185K. I also found `netstat -m': | 1026/1599/2625 mbufs in use (current/cache/total) | 1023/1513/2536/25600 mbuf clusters in use (current/cache/total/max) | 1/678 mbuf+clusters out of packet secondary zone in use (current/cache) | 0/121/121/12800 4k (page size) jumbo clusters in use (current/cache/total/max) | 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) | 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) | 2302K/3909K/6212K bytes allocated to network (current/cache/total) | 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) | 0/0/0 requests for jumbo clusters denied (4k/9k/16k) | 0/0/0 sfbufs in use (current/peak/max) | 0 requests for sfbufs denied | 0 requests for sfbufs delayed | 60 requests for I/O initiated by sendfile | 0 calls to protocol drain routines Again, this looks like chump change against my top output. What category does kernel memory get lumped into in top? I'd appreciate any help you can offer in terms of profiling memory usage and actually understanding what some of these figures mean. -- Chris Cowart Network Technical Lead Network & Infrastructure Services, RSSP-IT UC Berkeley
Description: PGP signature