Re: problems with mmap() and disk caching

2012-04-11 Thread Andrey Zonov

On 10.04.2012 20:19, Alan Cox wrote:

On 04/09/2012 10:26, John Baldwin wrote:

On Thursday, April 05, 2012 11:54:31 am Alan Cox wrote:

On 04/04/2012 02:17, Konstantin Belousov wrote:

On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote:

Hi,

I open the file, then call mmap() on the whole file and get pointer,
then I work with this pointer. I expect that page should be only once
touched to get it into the memory (disk cache?), but this doesn't
work!

I wrote the test (attached) and ran it for the 1G file generated from
/dev/random, the result is the following:

Prepare file:
# swapoff -a
# newfs /dev/ada0b
# mount /dev/ada0b /mnt
# dd if=/dev/random of=/mnt/random-1024 bs=1m count=1024

Purge cache:
# umount /mnt
# mount /dev/ada0b /mnt

Run test:
$ ./mmap /mnt/random-1024 30
mmap: 1 pass took: 7.431046 (none: 262112; res: 32; super:
0; other: 0)
mmap: 2 pass took: 7.356670 (none: 261648; res: 496; super:
0; other: 0)
mmap: 3 pass took: 7.307094 (none: 260521; res: 1623; super:
0; other: 0)
mmap: 4 pass took: 7.350239 (none: 258904; res: 3240; super:
0; other: 0)
mmap: 5 pass took: 7.392480 (none: 257286; res: 4858; super:
0; other: 0)
mmap: 6 pass took: 7.292069 (none: 255584; res: 6560; super:
0; other: 0)
mmap: 7 pass took: 7.048980 (none: 251142; res: 11002; super:
0; other: 0)
mmap: 8 pass took: 6.899387 (none: 247584; res: 14560; super:
0; other: 0)
mmap: 9 pass took: 7.190579 (none: 242992; res: 19152; super:
0; other: 0)
mmap: 10 pass took: 6.915482 (none: 239308; res: 22836; super:
0; other: 0)
mmap: 11 pass took: 6.565909 (none: 232835; res: 29309; super:
0; other: 0)
mmap: 12 pass took: 6.423945 (none: 226160; res: 35984; super:
0; other: 0)
mmap: 13 pass took: 6.315385 (none: 208555; res: 53589; super:
0; other: 0)
mmap: 14 pass took: 6.760780 (none: 192805; res: 69339; super:
0; other: 0)
mmap: 15 pass took: 5.721513 (none: 174497; res: 87647; super:
0; other: 0)
mmap: 16 pass took: 5.004424 (none: 155938; res: 106206; super:
0; other: 0)
mmap: 17 pass took: 4.224926 (none: 135639; res: 126505; super:
0; other: 0)
mmap: 18 pass took: 3.749608 (none: 117952; res: 144192; super:
0; other: 0)
mmap: 19 pass took: 3.398084 (none: 99066; res: 163078; super:
0; other: 0)
mmap: 20 pass took: 3.029557 (none: 74994; res: 187150; super:
0; other: 0)
mmap: 21 pass took: 2.379430 (none: 55231; res: 206913; super:
0; other: 0)
mmap: 22 pass took: 2.046521 (none: 40786; res: 221358; super:
0; other: 0)
mmap: 23 pass took: 1.152797 (none: 30311; res: 231833; super:
0; other: 0)
mmap: 24 pass took: 0.972617 (none: 16196; res: 245948; super:
0; other: 0)
mmap: 25 pass took: 0.577515 (none: 8286; res: 253858; super:
0; other: 0)
mmap: 26 pass took: 0.380738 (none: 3712; res: 258432; super:
0; other: 0)
mmap: 27 pass took: 0.253583 (none: 1193; res: 260951; super:
0; other: 0)
mmap: 28 pass took: 0.157508 (none: 0; res: 262144; super:
0; other: 0)
mmap: 29 pass took: 0.156169 (none: 0; res: 262144; super:
0; other: 0)
mmap: 30 pass took: 0.156550 (none: 0; res: 262144; super:
0; other: 0)

If I ran this:
$ cat /mnt/random-1024 /dev/null
before test, when result is the following:

$ ./mmap /mnt/random-1024 5
mmap: 1 pass took: 0.337657 (none: 0; res: 262144; super:
0; other: 0)
mmap: 2 pass took: 0.186137 (none: 0; res: 262144; super:
0; other: 0)
mmap: 3 pass took: 0.186132 (none: 0; res: 262144; super:
0; other: 0)
mmap: 4 pass took: 0.186535 (none: 0; res: 262144; super:
0; other: 0)
mmap: 5 pass took: 0.190353 (none: 0; res: 262144; super:
0; other: 0)

This is what I expect. But why this doesn't work without reading file
manually?

Issue seems to be in some change of the behaviour of the reserv or
phys allocator. I Cc:ed Alan.

I'm pretty sure that the behavior here hasn't significantly changed in
about twelve years. Otherwise, I agree with your analysis.

On more than one occasion, I've been tempted to change:

pmap_remove_all(mt);
if (mt-dirty != 0)
vm_page_deactivate(mt);
else
vm_page_cache(mt);

to:

vm_page_dontneed(mt);

because I suspect that the current code does more harm than good. In
theory, it saves activations of the page daemon. However, more often
than not, I suspect that we are spending more on page reactivations than
we are saving on page daemon activations. The sequential access
detection heuristic is just too easily triggered. For example, I've
seen it triggered by demand paging of the gcc text segment. Also, I
think that pmap_remove_all() and especially vm_page_cache() are too
severe for a detection heuristic that is so easily triggered.

Are you planning to commit this?



Not yet. I did some tests with a file that was several times larger than
DRAM, and I didn't like what I saw. Initially, everything behaved as
expected, but about halfway through the test the bulk of the pages were
active. Despite the call to pmap_clear_reference() in
vm_page_dontneed(), the page daemon is finding the pages to be
referenced and reactivating them. The net result is that the time it
takes to 

cp -R from the mounted ufs disk image hangs in DL+ vnread

2012-04-11 Thread Yuri
I have an 82GB UFS image file (ufs-snapshot) mounted on some directory 
ufs-snapshot.mount. (mount /dev/`mdconfig -a -t vnode -f ufs-snapshot` 
ufs-snapshot.mount)


Command 'cp -R ufs-snapshot.mount/usr other-dir/' hanged in the middle 
with DL+ status:

$ ps ax | grep cp
73635  10  DL+ 0:12.19 cp -R ufs-snapshot.mount/usr other-dir/
'top' shows it in vnread state:
73635 root1  200 10084K  2672K vnread  1   0:12  0.00% cp

When I ran 'ls' in the same mounted directory it hanged too with D+ status:
$ ps ax | grep ls
75882   2  D+  0:00.00 ls ufs-snapshot.mount/

What is happening? Why cp and ls hanged?
I think, cp -R hanged first and later ls is waiting on some op initiated 
by cp -R.

Somehow, cp -R managed to hang itself.

How can I find out what cp is waiting on?

9.0-STABLE amd64

Yuri
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Debugging zombies: pthread_sigmask and sigwait

2012-04-11 Thread Mel Flynn
Hi,

I'm currently stuck on a bug in Zarafa-spooler that creates zombies. and
working around it by claiming that our pthread library isn't normal
which uses standard signals rather then a signal thread.

My limited understanding of these facilities is however not enough to
see the actual problem here and reading of related manpages did not lead
me to a solution either. A test case reproducing the problem is attached.

What happens is that SIGCHLD is never received by the signal thread and
the child processes turn to zombies. Signal counters never go up, not
even for SIGINFO, which I added specifically to see if anything gets
through at all.

The signal thread shows being stuck in sigwait. It's reproducible on
8.3-PRERELEASE of a few days ago (r233768). I'm not able to test it on
anything newer unfortunately, but I suspect this is a bug/linuxism in
the code not in FreeBSD.

Thanks in advance for any insights.
-- 
Mel
PROG=spoolerbug
NO_MAN=yes
DEBUG_FLAGS=-g3
WARNS=6
WITH_DEBUG=yes
LDFLAGS+=-pthread

.include ../mk/core.mk
.include bsd.prog.mk
/*
 * vim: ts=4 sw=4 tw=78 noet ai fdm=marker
 */
#include sys/cdefs.h
__FBSDID($FreeBSD$);

#include sys/types.h
#include sys/wait.h

#include pthread.h
#include signal.h /* signal related */
#include unistd.h /* vfork */

#include stdlib.h /* arc4random() */
#include stdbool.h
#include getopt.h

#include stdio.h /* printing */

#include err.h

#define SERVER_ITERATIONS 3

/* declarations */
void *signal_handler(void *);
int running_server(void);
void process_signal(int);

/* globals */
pthread_t   signal_thread;
sigset_tsignal_mask;
boolbQuit = false;
pid_t   lastPid = 0;
char*szCommand;
size_t  n_sigs_handled = 0;
size_t  n_sigs_child = 0;
size_t  n_sigs_info = 0;

void *
signal_handler(void *args __unused)
{
int sig;

while( !bQuit  sigwait(signal_mask, sig) == 0 )
{
n_sigs_handled++;
process_signal(sig);
}

return NULL;
}

int
running_server(void)
{
u_int32_t r, max = 10;
pid_t pid, me;
int i = 0;

me = getpid();
warnx([master]: Send SIGINFO to %u, (unsigned)me);
do
{
warnx([master]: lastPid = %u, n_sigs_handled=%zu, 
n_sigs_child=%zu
n_sigs_info=%zu, (unsigned)lastPid, 
n_sigs_handled,
n_sigs_child, n_sigs_info);
pid = vfork();
if( pid  0 )
break;
if( pid == 0 )
{
execl(szCommand, getprogname(), -F, NULL);
_exit(EXIT_FAILURE);
}
else
{
if( bQuit )
break;
warnx([master]: Child spawned with pid %u, 
(unsigned)pid);
r = arc4random() % max;
sleep((unsigned int)r);
}
} while( !bQuit  i++  SERVER_ITERATIONS );
return (0);
}

void
process_signal(int sig)
{
int stat;
pid_t pid;

switch(sig)
{
case SIGTERM:
case SIGINT:
bQuit = true;
break;
case SIGCHLD:
n_sigs_child++;
while( (pid = waitpid(-1, stat, WNOHANG))  0)
{
lastPid = pid;
}
break;
case SIGINFO:
n_sigs_info++;
break;
default:
signal(sig, SIG_IGN);
break;
}
}

int
main(int argc, char *argv[])
{
bool bForked = false;
const char *opts = F;
int ch, hr, rc;

szCommand = argv[0];
while( (ch = getopt(argc, argv, opts)) != -1 )
{
if( ch == 'F' )
bForked = true;
}

argc -= optind;
argv += optind;

if( !bForked )
{
sigemptyset(signal_mask);
sigaddset(signal_mask, SIGTERM);
sigaddset(signal_mask, SIGINT);
sigaddset(signal_mask, SIGCHLD);
sigaddset(signal_mask, SIGINFO);
}

daemon(1, 1);
if( !bForked )
{
rc = pthread_sigmask(SIG_BLOCK, signal_mask, NULL);
if( rc != 0 )
err(EXIT_FAILURE, pthread_sigmask());

pthread_create(signal_thread, NULL, signal_handler, NULL);
hr = running_server();
warnx([master]: Joining signal thread);
pthread_join(signal_thread, NULL);
}
else

Re: Debugging zombies: pthread_sigmask and sigwait

2012-04-11 Thread Ian Lepore
On Wed, 2012-04-11 at 16:11 +0200, Mel Flynn wrote:
 Hi,
 
 I'm currently stuck on a bug in Zarafa-spooler that creates zombies. and
 working around it by claiming that our pthread library isn't normal
 which uses standard signals rather then a signal thread.
 
 My limited understanding of these facilities is however not enough to
 see the actual problem here and reading of related manpages did not lead
 me to a solution either. A test case reproducing the problem is attached.
 
 What happens is that SIGCHLD is never received by the signal thread and
 the child processes turn to zombies. Signal counters never go up, not
 even for SIGINFO, which I added specifically to see if anything gets
 through at all.
 
 The signal thread shows being stuck in sigwait. It's reproducible on
 8.3-PRERELEASE of a few days ago (r233768). I'm not able to test it on
 anything newer unfortunately, but I suspect this is a bug/linuxism in
 the code not in FreeBSD.
 
 Thanks in advance for any insights.
 ___
 freebsd-hackers@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

The signal mask for a new thread is inherited from the parent thread.
In your example code, the signal handling thread inherits the blocked
status of the signals as set up in main().  Try adding this line to
signal_handler() before it goes into its while() loop:

 pthread_sigmask(SIG_UNBLOCK, signal_mask, NULL);

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Debugging zombies: pthread_sigmask and sigwait

2012-04-11 Thread Mel Flynn
On 4/11/2012 16:26, Ian Lepore wrote:
 On Wed, 2012-04-11 at 16:11 +0200, Mel Flynn wrote:

 What happens is that SIGCHLD is never received by the signal thread and
 the child processes turn to zombies. Signal counters never go up, not
 even for SIGINFO, which I added specifically to see if anything gets
 through at all.

 The signal thread shows being stuck in sigwait. It's reproducible on
 8.3-PRERELEASE of a few days ago (r233768). I'm not able to test it on
 anything newer unfortunately, but I suspect this is a bug/linuxism in
 the code not in FreeBSD.

 The signal mask for a new thread is inherited from the parent thread.
 In your example code, the signal handling thread inherits the blocked
 status of the signals as set up in main().  Try adding this line to
 signal_handler() before it goes into its while() loop:
 
  pthread_sigmask(SIG_UNBLOCK, signal_mask, NULL);

That doesn't change anything and is in contrast to what sigwait(2) says:

 The signals specified by set /should be blocked/ at the time of the
 call to sigwait().

I also thought about a different child touching the signal code and two
processes blocked in sigwait in the original code (they fork a logger
process prior to sigemptyset()), but I explicitly avoid that in the test
case.
-- 
Mel
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Debugging zombies: pthread_sigmask and sigwait

2012-04-11 Thread Konstantin Belousov
On Wed, Apr 11, 2012 at 08:26:13AM -0600, Ian Lepore wrote:
 On Wed, 2012-04-11 at 16:11 +0200, Mel Flynn wrote:
  Hi,
  
  I'm currently stuck on a bug in Zarafa-spooler that creates zombies. and
  working around it by claiming that our pthread library isn't normal
  which uses standard signals rather then a signal thread.
  
  My limited understanding of these facilities is however not enough to
  see the actual problem here and reading of related manpages did not lead
  me to a solution either. A test case reproducing the problem is attached.
  
  What happens is that SIGCHLD is never received by the signal thread and
  the child processes turn to zombies. Signal counters never go up, not
  even for SIGINFO, which I added specifically to see if anything gets
  through at all.
  
  The signal thread shows being stuck in sigwait. It's reproducible on
  8.3-PRERELEASE of a few days ago (r233768). I'm not able to test it on
  anything newer unfortunately, but I suspect this is a bug/linuxism in
  the code not in FreeBSD.
  
  Thanks in advance for any insights.
  ___
  freebsd-hackers@freebsd.org mailing list
  http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
  To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
 
 The signal mask for a new thread is inherited from the parent thread.
 In your example code, the signal handling thread inherits the blocked
 status of the signals as set up in main().  Try adding this line to
 signal_handler() before it goes into its while() loop:
 
  pthread_sigmask(SIG_UNBLOCK, signal_mask, NULL);

This is completely wrong. sigwait(2) requires the waited signals to be
blocked, so the code is right in this regard.

What happens, as I guess it, the SIGINFO and SIGCHLD are ignored, so
kernel do not even bother to queue the signals to the master process.
Register a dummy signal handler for your signals with sigaction
before creating 'signal_handler' thread.


pgpT9OYoSMkgG.pgp
Description: PGP signature


Re: Debugging zombies: pthread_sigmask and sigwait

2012-04-11 Thread Ian Lepore
On Wed, 2012-04-11 at 17:47 +0300, Konstantin Belousov wrote:
 On Wed, Apr 11, 2012 at 08:26:13AM -0600, Ian Lepore wrote:
  On Wed, 2012-04-11 at 16:11 +0200, Mel Flynn wrote:
   Hi,
   
   I'm currently stuck on a bug in Zarafa-spooler that creates zombies. and
   working around it by claiming that our pthread library isn't normal
   which uses standard signals rather then a signal thread.
   
   My limited understanding of these facilities is however not enough to
   see the actual problem here and reading of related manpages did not lead
   me to a solution either. A test case reproducing the problem is attached.
   
   What happens is that SIGCHLD is never received by the signal thread and
   the child processes turn to zombies. Signal counters never go up, not
   even for SIGINFO, which I added specifically to see if anything gets
   through at all.
   
   The signal thread shows being stuck in sigwait. It's reproducible on
   8.3-PRERELEASE of a few days ago (r233768). I'm not able to test it on
   anything newer unfortunately, but I suspect this is a bug/linuxism in
   the code not in FreeBSD.
   
   Thanks in advance for any insights.
   ___
   freebsd-hackers@freebsd.org mailing list
   http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
   To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
  
  The signal mask for a new thread is inherited from the parent thread.
  In your example code, the signal handling thread inherits the blocked
  status of the signals as set up in main().  Try adding this line to
  signal_handler() before it goes into its while() loop:
  
   pthread_sigmask(SIG_UNBLOCK, signal_mask, NULL);
 
 This is completely wrong. sigwait(2) requires the waited signals to be
 blocked, so the code is right in this regard.
 

Ooops, sorry.  The code that sets up our signal handling threads uses
SIG_SETMASK rather than BLOCK/UNBLOCK, and my quick glance at it
misinterpretted what it was doing.

-- Ian



___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Debugging zombies: pthread_sigmask and sigwait

2012-04-11 Thread Mel Flynn
On 4/11/2012 16:47, Konstantin Belousov wrote:

 What happens, as I guess it, the SIGINFO and SIGCHLD are ignored, so
 kernel do not even bother to queue the signals to the master process.
 Register a dummy signal handler for your signals with sigaction
 before creating 'signal_handler' thread.

Right on the mark. I've modified the test code accordingly and things
work as expected. I've also applied the logic to the Zarafa spooler and
in the logs I'm finally seeing:
child: [79572] E-mail for user mel was accepted by SMTP server
parent: [79565] Received signal 20
^^

Many thanks and for the archives, the diff below sig.
-- 
Mel

diff -r 509d7301c720 spoolerbug/spoolerbug.c
--- a/spoolerbug/spoolerbug.c   Wed Apr 11 05:37:50 2012 -0800
+++ b/spoolerbug/spoolerbug.c   Wed Apr 11 07:35:50 2012 -0800
@@ -12,6 +12,7 @@
 #include unistd.h /* vfork */

 #include stdlib.h /* arc4random() */
+#include string.h /* memset() */
 #include stdbool.h
 #include getopt.h

@@ -25,6 +26,7 @@
 void *signal_handler(void *);
 int running_server(void);
 void process_signal(int);
+void signal_dummy(int);

 /* globals */
 pthread_t  signal_thread;
@@ -112,6 +114,12 @@
}
 }

+void
+signal_dummy(int sig __unused)
+{
+   return;
+}
+
 int
 main(int argc, char *argv[])
 {
@@ -131,11 +139,19 @@

if( !bForked )
{
+   struct sigaction dummies;
+
+   memset(dummies, 0, sizeof(dummies));
sigemptyset(signal_mask);
sigaddset(signal_mask, SIGTERM);
sigaddset(signal_mask, SIGINT);
sigaddset(signal_mask, SIGCHLD);
sigaddset(signal_mask, SIGINFO);
+   dummies.sa_handler = signal_dummy;
+   dummies.sa_mask = signal_mask;
+   dummies.sa_flags |= SA_NOCLDSTOP;
+   sigaction(SIGCHLD, dummies, NULL);
+   sigaction(SIGINFO, dummies, NULL);
}

daemon(1, 1);

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: CAM disk I/O starvation

2012-04-11 Thread Gary Jennejohn
On Tue, 3 Apr 2012 14:27:43 -0700
Jerry Toung jryto...@gmail.com wrote:

 On 4/3/12, Gary Jennejohn gljennj...@googlemail.com wrote:
 
  It would be interesting to see your patch.  I always run HEAD but maybe
  I could use it as a base for my own mods/tests.
 
 
 Here is the patch
 

[patch removed]

Just for the archive my bad disk performance seems to have been fixed in
HEAD by svn commit r234074.  Seems that all interrupts were being handled
by a single CPU/core (I have 6), which resulted in abysmal interrupt
handling when mutltiple disks were busy.

Since this commit my disk preformance is back to normal and long lags
are a thing of the past.

-- 
Gary Jennejohn
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: cp -R from the mounted ufs disk image hangs in DL+ vnread

2012-04-11 Thread Yuri

I created a PR for this: http://www.freebsd.org/cgi/query-pr.cgi?pr=166851
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [RFT][patch] Scheduling for HTT and not only

2012-04-11 Thread Adrian Chadd
The problem, IMHO, is none of this is in any way:

* documented;
* modellable by a user;
* explorable by a user (eg by an easy version of schedgraph to explore
things in a useful way.

Arnaud raises a valid point - he's given a synthetic benchmark whose
numbers are unpredictable. He's asking why. There are plenty of
complex systems interact complexly! style answers, none of which are
in any way useful to an end-user.

Arnaud, have you ever used ktr/sched_graph to look at what's going on?
I think it'd be a worthwhile step to begin documenting what's going on
here. I'd also suggest (in a completely non-inflammatory way, so you
may not be the right person to write it :-) perhaps keeping some kind
of blog listing the tests you're doing and what the results of system
inspection are. I think that kind of thing would be very very helpful
for engineers and users who are looking to get better behaviour in
their use case.

This kind of thing is sorely lacking at the moment.



Adrian
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org