Re: problems with mmap() and disk caching
On 10.04.2012 20:19, Alan Cox wrote: On 04/09/2012 10:26, John Baldwin wrote: On Thursday, April 05, 2012 11:54:31 am Alan Cox wrote: On 04/04/2012 02:17, Konstantin Belousov wrote: On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote: Hi, I open the file, then call mmap() on the whole file and get pointer, then I work with this pointer. I expect that page should be only once touched to get it into the memory (disk cache?), but this doesn't work! I wrote the test (attached) and ran it for the 1G file generated from /dev/random, the result is the following: Prepare file: # swapoff -a # newfs /dev/ada0b # mount /dev/ada0b /mnt # dd if=/dev/random of=/mnt/random-1024 bs=1m count=1024 Purge cache: # umount /mnt # mount /dev/ada0b /mnt Run test: $ ./mmap /mnt/random-1024 30 mmap: 1 pass took: 7.431046 (none: 262112; res: 32; super: 0; other: 0) mmap: 2 pass took: 7.356670 (none: 261648; res: 496; super: 0; other: 0) mmap: 3 pass took: 7.307094 (none: 260521; res: 1623; super: 0; other: 0) mmap: 4 pass took: 7.350239 (none: 258904; res: 3240; super: 0; other: 0) mmap: 5 pass took: 7.392480 (none: 257286; res: 4858; super: 0; other: 0) mmap: 6 pass took: 7.292069 (none: 255584; res: 6560; super: 0; other: 0) mmap: 7 pass took: 7.048980 (none: 251142; res: 11002; super: 0; other: 0) mmap: 8 pass took: 6.899387 (none: 247584; res: 14560; super: 0; other: 0) mmap: 9 pass took: 7.190579 (none: 242992; res: 19152; super: 0; other: 0) mmap: 10 pass took: 6.915482 (none: 239308; res: 22836; super: 0; other: 0) mmap: 11 pass took: 6.565909 (none: 232835; res: 29309; super: 0; other: 0) mmap: 12 pass took: 6.423945 (none: 226160; res: 35984; super: 0; other: 0) mmap: 13 pass took: 6.315385 (none: 208555; res: 53589; super: 0; other: 0) mmap: 14 pass took: 6.760780 (none: 192805; res: 69339; super: 0; other: 0) mmap: 15 pass took: 5.721513 (none: 174497; res: 87647; super: 0; other: 0) mmap: 16 pass took: 5.004424 (none: 155938; res: 106206; super: 0; other: 0) mmap: 17 pass took: 4.224926 (none: 135639; res: 126505; super: 0; other: 0) mmap: 18 pass took: 3.749608 (none: 117952; res: 144192; super: 0; other: 0) mmap: 19 pass took: 3.398084 (none: 99066; res: 163078; super: 0; other: 0) mmap: 20 pass took: 3.029557 (none: 74994; res: 187150; super: 0; other: 0) mmap: 21 pass took: 2.379430 (none: 55231; res: 206913; super: 0; other: 0) mmap: 22 pass took: 2.046521 (none: 40786; res: 221358; super: 0; other: 0) mmap: 23 pass took: 1.152797 (none: 30311; res: 231833; super: 0; other: 0) mmap: 24 pass took: 0.972617 (none: 16196; res: 245948; super: 0; other: 0) mmap: 25 pass took: 0.577515 (none: 8286; res: 253858; super: 0; other: 0) mmap: 26 pass took: 0.380738 (none: 3712; res: 258432; super: 0; other: 0) mmap: 27 pass took: 0.253583 (none: 1193; res: 260951; super: 0; other: 0) mmap: 28 pass took: 0.157508 (none: 0; res: 262144; super: 0; other: 0) mmap: 29 pass took: 0.156169 (none: 0; res: 262144; super: 0; other: 0) mmap: 30 pass took: 0.156550 (none: 0; res: 262144; super: 0; other: 0) If I ran this: $ cat /mnt/random-1024 /dev/null before test, when result is the following: $ ./mmap /mnt/random-1024 5 mmap: 1 pass took: 0.337657 (none: 0; res: 262144; super: 0; other: 0) mmap: 2 pass took: 0.186137 (none: 0; res: 262144; super: 0; other: 0) mmap: 3 pass took: 0.186132 (none: 0; res: 262144; super: 0; other: 0) mmap: 4 pass took: 0.186535 (none: 0; res: 262144; super: 0; other: 0) mmap: 5 pass took: 0.190353 (none: 0; res: 262144; super: 0; other: 0) This is what I expect. But why this doesn't work without reading file manually? Issue seems to be in some change of the behaviour of the reserv or phys allocator. I Cc:ed Alan. I'm pretty sure that the behavior here hasn't significantly changed in about twelve years. Otherwise, I agree with your analysis. On more than one occasion, I've been tempted to change: pmap_remove_all(mt); if (mt-dirty != 0) vm_page_deactivate(mt); else vm_page_cache(mt); to: vm_page_dontneed(mt); because I suspect that the current code does more harm than good. In theory, it saves activations of the page daemon. However, more often than not, I suspect that we are spending more on page reactivations than we are saving on page daemon activations. The sequential access detection heuristic is just too easily triggered. For example, I've seen it triggered by demand paging of the gcc text segment. Also, I think that pmap_remove_all() and especially vm_page_cache() are too severe for a detection heuristic that is so easily triggered. Are you planning to commit this? Not yet. I did some tests with a file that was several times larger than DRAM, and I didn't like what I saw. Initially, everything behaved as expected, but about halfway through the test the bulk of the pages were active. Despite the call to pmap_clear_reference() in vm_page_dontneed(), the page daemon is finding the pages to be referenced and reactivating them. The net result is that the time it takes to
cp -R from the mounted ufs disk image hangs in DL+ vnread
I have an 82GB UFS image file (ufs-snapshot) mounted on some directory ufs-snapshot.mount. (mount /dev/`mdconfig -a -t vnode -f ufs-snapshot` ufs-snapshot.mount) Command 'cp -R ufs-snapshot.mount/usr other-dir/' hanged in the middle with DL+ status: $ ps ax | grep cp 73635 10 DL+ 0:12.19 cp -R ufs-snapshot.mount/usr other-dir/ 'top' shows it in vnread state: 73635 root1 200 10084K 2672K vnread 1 0:12 0.00% cp When I ran 'ls' in the same mounted directory it hanged too with D+ status: $ ps ax | grep ls 75882 2 D+ 0:00.00 ls ufs-snapshot.mount/ What is happening? Why cp and ls hanged? I think, cp -R hanged first and later ls is waiting on some op initiated by cp -R. Somehow, cp -R managed to hang itself. How can I find out what cp is waiting on? 9.0-STABLE amd64 Yuri ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Debugging zombies: pthread_sigmask and sigwait
Hi, I'm currently stuck on a bug in Zarafa-spooler that creates zombies. and working around it by claiming that our pthread library isn't normal which uses standard signals rather then a signal thread. My limited understanding of these facilities is however not enough to see the actual problem here and reading of related manpages did not lead me to a solution either. A test case reproducing the problem is attached. What happens is that SIGCHLD is never received by the signal thread and the child processes turn to zombies. Signal counters never go up, not even for SIGINFO, which I added specifically to see if anything gets through at all. The signal thread shows being stuck in sigwait. It's reproducible on 8.3-PRERELEASE of a few days ago (r233768). I'm not able to test it on anything newer unfortunately, but I suspect this is a bug/linuxism in the code not in FreeBSD. Thanks in advance for any insights. -- Mel PROG=spoolerbug NO_MAN=yes DEBUG_FLAGS=-g3 WARNS=6 WITH_DEBUG=yes LDFLAGS+=-pthread .include ../mk/core.mk .include bsd.prog.mk /* * vim: ts=4 sw=4 tw=78 noet ai fdm=marker */ #include sys/cdefs.h __FBSDID($FreeBSD$); #include sys/types.h #include sys/wait.h #include pthread.h #include signal.h /* signal related */ #include unistd.h /* vfork */ #include stdlib.h /* arc4random() */ #include stdbool.h #include getopt.h #include stdio.h /* printing */ #include err.h #define SERVER_ITERATIONS 3 /* declarations */ void *signal_handler(void *); int running_server(void); void process_signal(int); /* globals */ pthread_t signal_thread; sigset_tsignal_mask; boolbQuit = false; pid_t lastPid = 0; char*szCommand; size_t n_sigs_handled = 0; size_t n_sigs_child = 0; size_t n_sigs_info = 0; void * signal_handler(void *args __unused) { int sig; while( !bQuit sigwait(signal_mask, sig) == 0 ) { n_sigs_handled++; process_signal(sig); } return NULL; } int running_server(void) { u_int32_t r, max = 10; pid_t pid, me; int i = 0; me = getpid(); warnx([master]: Send SIGINFO to %u, (unsigned)me); do { warnx([master]: lastPid = %u, n_sigs_handled=%zu, n_sigs_child=%zu n_sigs_info=%zu, (unsigned)lastPid, n_sigs_handled, n_sigs_child, n_sigs_info); pid = vfork(); if( pid 0 ) break; if( pid == 0 ) { execl(szCommand, getprogname(), -F, NULL); _exit(EXIT_FAILURE); } else { if( bQuit ) break; warnx([master]: Child spawned with pid %u, (unsigned)pid); r = arc4random() % max; sleep((unsigned int)r); } } while( !bQuit i++ SERVER_ITERATIONS ); return (0); } void process_signal(int sig) { int stat; pid_t pid; switch(sig) { case SIGTERM: case SIGINT: bQuit = true; break; case SIGCHLD: n_sigs_child++; while( (pid = waitpid(-1, stat, WNOHANG)) 0) { lastPid = pid; } break; case SIGINFO: n_sigs_info++; break; default: signal(sig, SIG_IGN); break; } } int main(int argc, char *argv[]) { bool bForked = false; const char *opts = F; int ch, hr, rc; szCommand = argv[0]; while( (ch = getopt(argc, argv, opts)) != -1 ) { if( ch == 'F' ) bForked = true; } argc -= optind; argv += optind; if( !bForked ) { sigemptyset(signal_mask); sigaddset(signal_mask, SIGTERM); sigaddset(signal_mask, SIGINT); sigaddset(signal_mask, SIGCHLD); sigaddset(signal_mask, SIGINFO); } daemon(1, 1); if( !bForked ) { rc = pthread_sigmask(SIG_BLOCK, signal_mask, NULL); if( rc != 0 ) err(EXIT_FAILURE, pthread_sigmask()); pthread_create(signal_thread, NULL, signal_handler, NULL); hr = running_server(); warnx([master]: Joining signal thread); pthread_join(signal_thread, NULL); } else
Re: Debugging zombies: pthread_sigmask and sigwait
On Wed, 2012-04-11 at 16:11 +0200, Mel Flynn wrote: Hi, I'm currently stuck on a bug in Zarafa-spooler that creates zombies. and working around it by claiming that our pthread library isn't normal which uses standard signals rather then a signal thread. My limited understanding of these facilities is however not enough to see the actual problem here and reading of related manpages did not lead me to a solution either. A test case reproducing the problem is attached. What happens is that SIGCHLD is never received by the signal thread and the child processes turn to zombies. Signal counters never go up, not even for SIGINFO, which I added specifically to see if anything gets through at all. The signal thread shows being stuck in sigwait. It's reproducible on 8.3-PRERELEASE of a few days ago (r233768). I'm not able to test it on anything newer unfortunately, but I suspect this is a bug/linuxism in the code not in FreeBSD. Thanks in advance for any insights. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org The signal mask for a new thread is inherited from the parent thread. In your example code, the signal handling thread inherits the blocked status of the signals as set up in main(). Try adding this line to signal_handler() before it goes into its while() loop: pthread_sigmask(SIG_UNBLOCK, signal_mask, NULL); -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Debugging zombies: pthread_sigmask and sigwait
On 4/11/2012 16:26, Ian Lepore wrote: On Wed, 2012-04-11 at 16:11 +0200, Mel Flynn wrote: What happens is that SIGCHLD is never received by the signal thread and the child processes turn to zombies. Signal counters never go up, not even for SIGINFO, which I added specifically to see if anything gets through at all. The signal thread shows being stuck in sigwait. It's reproducible on 8.3-PRERELEASE of a few days ago (r233768). I'm not able to test it on anything newer unfortunately, but I suspect this is a bug/linuxism in the code not in FreeBSD. The signal mask for a new thread is inherited from the parent thread. In your example code, the signal handling thread inherits the blocked status of the signals as set up in main(). Try adding this line to signal_handler() before it goes into its while() loop: pthread_sigmask(SIG_UNBLOCK, signal_mask, NULL); That doesn't change anything and is in contrast to what sigwait(2) says: The signals specified by set /should be blocked/ at the time of the call to sigwait(). I also thought about a different child touching the signal code and two processes blocked in sigwait in the original code (they fork a logger process prior to sigemptyset()), but I explicitly avoid that in the test case. -- Mel ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Debugging zombies: pthread_sigmask and sigwait
On Wed, Apr 11, 2012 at 08:26:13AM -0600, Ian Lepore wrote: On Wed, 2012-04-11 at 16:11 +0200, Mel Flynn wrote: Hi, I'm currently stuck on a bug in Zarafa-spooler that creates zombies. and working around it by claiming that our pthread library isn't normal which uses standard signals rather then a signal thread. My limited understanding of these facilities is however not enough to see the actual problem here and reading of related manpages did not lead me to a solution either. A test case reproducing the problem is attached. What happens is that SIGCHLD is never received by the signal thread and the child processes turn to zombies. Signal counters never go up, not even for SIGINFO, which I added specifically to see if anything gets through at all. The signal thread shows being stuck in sigwait. It's reproducible on 8.3-PRERELEASE of a few days ago (r233768). I'm not able to test it on anything newer unfortunately, but I suspect this is a bug/linuxism in the code not in FreeBSD. Thanks in advance for any insights. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org The signal mask for a new thread is inherited from the parent thread. In your example code, the signal handling thread inherits the blocked status of the signals as set up in main(). Try adding this line to signal_handler() before it goes into its while() loop: pthread_sigmask(SIG_UNBLOCK, signal_mask, NULL); This is completely wrong. sigwait(2) requires the waited signals to be blocked, so the code is right in this regard. What happens, as I guess it, the SIGINFO and SIGCHLD are ignored, so kernel do not even bother to queue the signals to the master process. Register a dummy signal handler for your signals with sigaction before creating 'signal_handler' thread. pgpT9OYoSMkgG.pgp Description: PGP signature
Re: Debugging zombies: pthread_sigmask and sigwait
On Wed, 2012-04-11 at 17:47 +0300, Konstantin Belousov wrote: On Wed, Apr 11, 2012 at 08:26:13AM -0600, Ian Lepore wrote: On Wed, 2012-04-11 at 16:11 +0200, Mel Flynn wrote: Hi, I'm currently stuck on a bug in Zarafa-spooler that creates zombies. and working around it by claiming that our pthread library isn't normal which uses standard signals rather then a signal thread. My limited understanding of these facilities is however not enough to see the actual problem here and reading of related manpages did not lead me to a solution either. A test case reproducing the problem is attached. What happens is that SIGCHLD is never received by the signal thread and the child processes turn to zombies. Signal counters never go up, not even for SIGINFO, which I added specifically to see if anything gets through at all. The signal thread shows being stuck in sigwait. It's reproducible on 8.3-PRERELEASE of a few days ago (r233768). I'm not able to test it on anything newer unfortunately, but I suspect this is a bug/linuxism in the code not in FreeBSD. Thanks in advance for any insights. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org The signal mask for a new thread is inherited from the parent thread. In your example code, the signal handling thread inherits the blocked status of the signals as set up in main(). Try adding this line to signal_handler() before it goes into its while() loop: pthread_sigmask(SIG_UNBLOCK, signal_mask, NULL); This is completely wrong. sigwait(2) requires the waited signals to be blocked, so the code is right in this regard. Ooops, sorry. The code that sets up our signal handling threads uses SIG_SETMASK rather than BLOCK/UNBLOCK, and my quick glance at it misinterpretted what it was doing. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Debugging zombies: pthread_sigmask and sigwait
On 4/11/2012 16:47, Konstantin Belousov wrote: What happens, as I guess it, the SIGINFO and SIGCHLD are ignored, so kernel do not even bother to queue the signals to the master process. Register a dummy signal handler for your signals with sigaction before creating 'signal_handler' thread. Right on the mark. I've modified the test code accordingly and things work as expected. I've also applied the logic to the Zarafa spooler and in the logs I'm finally seeing: child: [79572] E-mail for user mel was accepted by SMTP server parent: [79565] Received signal 20 ^^ Many thanks and for the archives, the diff below sig. -- Mel diff -r 509d7301c720 spoolerbug/spoolerbug.c --- a/spoolerbug/spoolerbug.c Wed Apr 11 05:37:50 2012 -0800 +++ b/spoolerbug/spoolerbug.c Wed Apr 11 07:35:50 2012 -0800 @@ -12,6 +12,7 @@ #include unistd.h /* vfork */ #include stdlib.h /* arc4random() */ +#include string.h /* memset() */ #include stdbool.h #include getopt.h @@ -25,6 +26,7 @@ void *signal_handler(void *); int running_server(void); void process_signal(int); +void signal_dummy(int); /* globals */ pthread_t signal_thread; @@ -112,6 +114,12 @@ } } +void +signal_dummy(int sig __unused) +{ + return; +} + int main(int argc, char *argv[]) { @@ -131,11 +139,19 @@ if( !bForked ) { + struct sigaction dummies; + + memset(dummies, 0, sizeof(dummies)); sigemptyset(signal_mask); sigaddset(signal_mask, SIGTERM); sigaddset(signal_mask, SIGINT); sigaddset(signal_mask, SIGCHLD); sigaddset(signal_mask, SIGINFO); + dummies.sa_handler = signal_dummy; + dummies.sa_mask = signal_mask; + dummies.sa_flags |= SA_NOCLDSTOP; + sigaction(SIGCHLD, dummies, NULL); + sigaction(SIGINFO, dummies, NULL); } daemon(1, 1); ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: CAM disk I/O starvation
On Tue, 3 Apr 2012 14:27:43 -0700 Jerry Toung jryto...@gmail.com wrote: On 4/3/12, Gary Jennejohn gljennj...@googlemail.com wrote: It would be interesting to see your patch. I always run HEAD but maybe I could use it as a base for my own mods/tests. Here is the patch [patch removed] Just for the archive my bad disk performance seems to have been fixed in HEAD by svn commit r234074. Seems that all interrupts were being handled by a single CPU/core (I have 6), which resulted in abysmal interrupt handling when mutltiple disks were busy. Since this commit my disk preformance is back to normal and long lags are a thing of the past. -- Gary Jennejohn ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: cp -R from the mounted ufs disk image hangs in DL+ vnread
I created a PR for this: http://www.freebsd.org/cgi/query-pr.cgi?pr=166851 ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: [RFT][patch] Scheduling for HTT and not only
The problem, IMHO, is none of this is in any way: * documented; * modellable by a user; * explorable by a user (eg by an easy version of schedgraph to explore things in a useful way. Arnaud raises a valid point - he's given a synthetic benchmark whose numbers are unpredictable. He's asking why. There are plenty of complex systems interact complexly! style answers, none of which are in any way useful to an end-user. Arnaud, have you ever used ktr/sched_graph to look at what's going on? I think it'd be a worthwhile step to begin documenting what's going on here. I'd also suggest (in a completely non-inflammatory way, so you may not be the right person to write it :-) perhaps keeping some kind of blog listing the tests you're doing and what the results of system inspection are. I think that kind of thing would be very very helpful for engineers and users who are looking to get better behaviour in their use case. This kind of thing is sorely lacking at the moment. Adrian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org