Re: help X11 performance: make sigprocmask(2) SY_NOLOCK

2013-11-08 Thread Martin Pieuchot
On 20/06/13(Thu) 15:18, Martin Pelikan wrote:
  And p_sigmask is copied during fork, so that needs a bit of thought.
  I guess it doesn't matter for the new child, as it isn't running yet
  and therefore can't invoke sigprocmask(2).  And the parent should be
  safe as well, as it is in fork(2) and therefore can't call
  sigprocmask(2) either.
 
 memcpy(p_startcopy) which is, tadaaa, p_sigmask!  From implementations
 of memcpy I've seen (usage of SSE or non-temporal instructions to begin
 with) it seems that no guarantee about how big a chunk does it copy
 atomically can be assumed...
 
 So a multithreaded process doing sigprocmask(2) in one thread and
 simultaneously fork(2) in the other?  Not a problem either, because
 1003.1 2004 edition says that sigprocmask(2) is equivalent to
 pthread_sigmask(3) and the fork only copies the other thread, not
 bothering with the sigprocmask-ing one.
 I haven't checked the code that thoroughly but this is what POSIX says.
 
  There's also a couple of additional p-p_sigmask |= ... in kern_sig.c.
  
  All of these need to become atomic_setbits_int/atomic_clearbits_int calls.
 
 Does the third one do the charm?  I haven't compiled the other arch's,
 but i386 is running fine.

So what's going on with this diff?  fwiw I tried it on macppc and amd64.

I'm also wondering, don't you think ptsignal() needs to be modified as
well?  Because it might access p_sigmask various times, so are you sure
your diff won't introduce any side effect if the mask is modified in
between?  Or is it simply impossible?

 --
 Martin Pelikan
 
 
 Index: arch/hp300/hp300/trap.c
 ===
 RCS file: /cvs/src/sys/arch/hp300/hp300/trap.c,v
 retrieving revision 1.62
 diff -u -p -r1.62 trap.c
 --- arch/hp300/hp300/trap.c   31 Dec 2012 06:46:13 -  1.62
 +++ arch/hp300/hp300/trap.c   20 Jun 2013 12:51:46 -
 @@ -321,7 +321,7 @@ dopanic:
   i = sigmask(SIGILL);
   p-p_sigacts-ps_sigignore = ~i;
   p-p_sigacts-ps_sigcatch = ~i;
 - p-p_sigmask = ~i;
 + atomic_clearbits_int(p-p_sigmask, i);
   i = SIGILL;
   ucode = frame.f_format; /* XXX was ILL_RESAD_FAULT */
   typ = ILL_COPROC;
 Index: arch/mvme68k/mvme68k/trap.c
 ===
 RCS file: /cvs/src/sys/arch/mvme68k/mvme68k/trap.c,v
 retrieving revision 1.77
 diff -u -p -r1.77 trap.c
 --- arch/mvme68k/mvme68k/trap.c   31 Dec 2012 06:46:13 -  1.77
 +++ arch/mvme68k/mvme68k/trap.c   20 Jun 2013 12:51:47 -
 @@ -260,7 +260,7 @@ copyfault:
   i = sigmask(SIGILL);
   p-p_sigacts-ps_sigignore = ~i;
   p-p_sigacts-ps_sigcatch = ~i;
 - p-p_sigmask = ~i;
 + atomic_clearbits_int(p-p_sigmask, i);
   i = SIGILL;
   ucode = frame.f_format; /* XXX was ILL_RESAD_FAULT */
   typ = ILL_COPROC;
 Index: compat/linux/linux_signal.c
 ===
 RCS file: /cvs/src/sys/compat/linux/linux_signal.c,v
 retrieving revision 1.15
 diff -u -p -r1.15 linux_signal.c
 --- compat/linux/linux_signal.c   19 Jun 2012 11:35:29 -  1.15
 +++ compat/linux/linux_signal.c   20 Jun 2013 12:51:49 -
 @@ -584,7 +584,6 @@ linux_sys_sigprocmask(p, v, retval)
   linux_old_sigset_t ss;
   sigset_t bs;
   int error = 0;
 - int s;
  
   *retval = 0;
  
 @@ -604,19 +603,18 @@ linux_sys_sigprocmask(p, v, retval)
  
   linux_old_to_bsd_sigset(ss, bs);
  
 - s = splhigh();
 -
 + bs = ~sigcantmask;
   switch (SCARG(uap, how)) {
   case LINUX_SIG_BLOCK:
 - p-p_sigmask |= bs  ~sigcantmask;
 + atomic_setbits_int(p-p_sigmask, bs);
   break;
  
   case LINUX_SIG_UNBLOCK:
 - p-p_sigmask = ~bs;
 + atomic_clearbits_int(p-p_sigmask, bs);
   break;
  
   case LINUX_SIG_SETMASK:
 - p-p_sigmask = bs  ~sigcantmask;
 + p-p_sigmask = bs;
   break;
  
   default:
 @@ -624,8 +622,6 @@ linux_sys_sigprocmask(p, v, retval)
   break;
   }
  
 - splx(s);
 -
   return (error);
  }
  
 @@ -644,7 +640,6 @@ linux_sys_rt_sigprocmask(p, v, retval)
   linux_sigset_t ls;
   sigset_t bs;
   int error = 0;
 - int s;
  
   if (SCARG(uap, sigsetsize) != sizeof(linux_sigset_t))
   return (EINVAL);
 @@ -667,19 +662,18 @@ linux_sys_rt_sigprocmask(p, v, retval)
  
   linux_to_bsd_sigset(ls, bs);
  
 - s = splhigh();
 -
 + bs = ~sigcantmask;
   switch (SCARG(uap, how)) {
   case LINUX_SIG_BLOCK:
 - p-p_sigmask |= bs  ~sigcantmask;
 + atomic_setbits_int(p-p_sigmask, bs);
   break;
  
   case LINUX_SIG_UNBLOCK:
 - p-p_sigmask = ~bs;
 + 

Re: help X11 performance: make sigprocmask(2) SY_NOLOCK

2013-11-08 Thread Mark Kettenis
 Date: Fri, 8 Nov 2013 12:34:42 +0100
 From: Martin Pieuchot mpieuc...@nolizard.org
 
 On 20/06/13(Thu) 15:18, Martin Pelikan wrote:
   And p_sigmask is copied during fork, so that needs a bit of thought.
   I guess it doesn't matter for the new child, as it isn't running yet
   and therefore can't invoke sigprocmask(2).  And the parent should be
   safe as well, as it is in fork(2) and therefore can't call
   sigprocmask(2) either.
  
  memcpy(p_startcopy) which is, tadaaa, p_sigmask!  From implementations
  of memcpy I've seen (usage of SSE or non-temporal instructions to begin
  with) it seems that no guarantee about how big a chunk does it copy
  atomically can be assumed...
  
  So a multithreaded process doing sigprocmask(2) in one thread and
  simultaneously fork(2) in the other?  Not a problem either, because
  1003.1 2004 edition says that sigprocmask(2) is equivalent to
  pthread_sigmask(3) and the fork only copies the other thread, not
  bothering with the sigprocmask-ing one.
  I haven't checked the code that thoroughly but this is what POSIX says.
  
   There's also a couple of additional p-p_sigmask |= ... in kern_sig.c.
   
   All of these need to become atomic_setbits_int/atomic_clearbits_int calls.
  
  Does the third one do the charm?  I haven't compiled the other arch's,
  but i386 is running fine.
 
 So what's going on with this diff?  fwiw I tried it on macppc and amd64.
 
 I'm also wondering, don't you think ptsignal() needs to be modified as
 well?  Because it might access p_sigmask various times, so are you sure
 your diff won't introduce any side effect if the mask is modified in
 between?  Or is it simply impossible?

I came to the conclusion that in its current form, this diff is *not*
safe :(.  There is indeed a problem in ptsignal().  Essentially there
is a race between deciding to which thread to deliver the signal to
and the actual delivery.  This particularly applies to the case where
you are blocking a signal; unblocking a signal is probably safe.

  Index: arch/hp300/hp300/trap.c
  ===
  RCS file: /cvs/src/sys/arch/hp300/hp300/trap.c,v
  retrieving revision 1.62
  diff -u -p -r1.62 trap.c
  --- arch/hp300/hp300/trap.c 31 Dec 2012 06:46:13 -  1.62
  +++ arch/hp300/hp300/trap.c 20 Jun 2013 12:51:46 -
  @@ -321,7 +321,7 @@ dopanic:
  i = sigmask(SIGILL);
  p-p_sigacts-ps_sigignore = ~i;
  p-p_sigacts-ps_sigcatch = ~i;
  -   p-p_sigmask = ~i;
  +   atomic_clearbits_int(p-p_sigmask, i);
  i = SIGILL;
  ucode = frame.f_format; /* XXX was ILL_RESAD_FAULT */
  typ = ILL_COPROC;
  Index: arch/mvme68k/mvme68k/trap.c
  ===
  RCS file: /cvs/src/sys/arch/mvme68k/mvme68k/trap.c,v
  retrieving revision 1.77
  diff -u -p -r1.77 trap.c
  --- arch/mvme68k/mvme68k/trap.c 31 Dec 2012 06:46:13 -  1.77
  +++ arch/mvme68k/mvme68k/trap.c 20 Jun 2013 12:51:47 -
  @@ -260,7 +260,7 @@ copyfault:
  i = sigmask(SIGILL);
  p-p_sigacts-ps_sigignore = ~i;
  p-p_sigacts-ps_sigcatch = ~i;
  -   p-p_sigmask = ~i;
  +   atomic_clearbits_int(p-p_sigmask, i);
  i = SIGILL;
  ucode = frame.f_format; /* XXX was ILL_RESAD_FAULT */
  typ = ILL_COPROC;
  Index: compat/linux/linux_signal.c
  ===
  RCS file: /cvs/src/sys/compat/linux/linux_signal.c,v
  retrieving revision 1.15
  diff -u -p -r1.15 linux_signal.c
  --- compat/linux/linux_signal.c 19 Jun 2012 11:35:29 -  1.15
  +++ compat/linux/linux_signal.c 20 Jun 2013 12:51:49 -
  @@ -584,7 +584,6 @@ linux_sys_sigprocmask(p, v, retval)
  linux_old_sigset_t ss;
  sigset_t bs;
  int error = 0;
  -   int s;
   
  *retval = 0;
   
  @@ -604,19 +603,18 @@ linux_sys_sigprocmask(p, v, retval)
   
  linux_old_to_bsd_sigset(ss, bs);
   
  -   s = splhigh();
  -
  +   bs = ~sigcantmask;
  switch (SCARG(uap, how)) {
  case LINUX_SIG_BLOCK:
  -   p-p_sigmask |= bs  ~sigcantmask;
  +   atomic_setbits_int(p-p_sigmask, bs);
  break;
   
  case LINUX_SIG_UNBLOCK:
  -   p-p_sigmask = ~bs;
  +   atomic_clearbits_int(p-p_sigmask, bs);
  break;
   
  case LINUX_SIG_SETMASK:
  -   p-p_sigmask = bs  ~sigcantmask;
  +   p-p_sigmask = bs;
  break;
   
  default:
  @@ -624,8 +622,6 @@ linux_sys_sigprocmask(p, v, retval)
  break;
  }
   
  -   splx(s);
  -
  return (error);
   }
   
  @@ -644,7 +640,6 @@ linux_sys_rt_sigprocmask(p, v, retval)
  linux_sigset_t ls;
  sigset_t bs;
  int error = 0;
  -   int s;
   
  if (SCARG(uap, sigsetsize) != sizeof(linux_sigset_t))
  return (EINVAL);
  @@ -667,19 +662,18 @@ 

Re: help X11 performance: make sigprocmask(2) SY_NOLOCK

2013-06-20 Thread Otto Moerbeek
On Wed, Jun 19, 2013 at 07:39:15PM +, Miod Vallat wrote:

-   p-p_sigmask = mask ~ sigcantmask;
+   p-p_sigmask = mask;
  
  On the right architecture where a word store isn't atomic enough and
  with the right compiler that decides to put p_sigmask on an address
  ending with 0xFFF with 4k-sized pages, we have two problems already.
 
 Holy Pumpkin forbid, struct proc layout is sane enough for fields to be
 aligned to their natural alignment, and the allocator will return
 properly aligned structs as well.
 
  I'm only asking if such a situation can happen, or if there is some
  ensure_this_assignment_is_always_atomic(p-p_sigmask, mask); function
  that I missed.
 
 If such a function exists, it should be shot, if only for having a too
 long name.
 
 Miod

But watch out, as kettenis@ already mentioned elsewhere, this can only
be safe if *all* reads and modifications to the mask are done in an
atomic way. 

Straight read and assignment of int values is safe, afaik, but e.g.
struct assignment and memcpy are *not* atomic. 

-Otto




Re: help X11 performance: make sigprocmask(2) SY_NOLOCK

2013-06-20 Thread Stuart Henderson
On 2013/06/20 09:38, Otto Moerbeek wrote:
 On Wed, Jun 19, 2013 at 07:39:15PM +, Miod Vallat wrote:
 
 - p-p_sigmask = mask ~ sigcantmask;
 + p-p_sigmask = mask;
   
   On the right architecture where a word store isn't atomic enough and
   with the right compiler that decides to put p_sigmask on an address
   ending with 0xFFF with 4k-sized pages, we have two problems already.
  
  Holy Pumpkin forbid, struct proc layout is sane enough for fields to be
  aligned to their natural alignment, and the allocator will return
  properly aligned structs as well.
  
   I'm only asking if such a situation can happen, or if there is some
   ensure_this_assignment_is_always_atomic(p-p_sigmask, mask); function
   that I missed.
  
  If such a function exists, it should be shot, if only for having a too
  long name.
  
  Miod
 
 But watch out, as kettenis@ already mentioned elsewhere, this can only
 be safe if *all* reads and modifications to the mask are done in an
 atomic way. 
 
 Straight read and assignment of int values is safe, afaik, but e.g.
 struct assignment and memcpy are *not* atomic. 

I spotted a few places where p_sigmask is = or |='d, for example these
from linux compat, might these be an issue?

p-p_sigmask = ~bs;
p-p_sigmask |= bs  ~sigcantmask;



Re: help X11 performance: make sigprocmask(2) SY_NOLOCK

2013-06-20 Thread Martin Pelikan
 I think this is only changed on the context of curproc, so there will
 only be one dude modifying it at a time.
 Problem I see is if there is other code which reads p_sigmask twice,
 since it losed the atomicity between two reads, don't know what can
 happen.

This was exactly my point.  Now that we've established aligned int writes
are fine, there's the wrong assumption that two reads from that memory
(even created by the compiler) must have to have the same value.

 On a quick look it doesn't seem to happen.

Looking at the code more closely:

- DIAGNOSTIC panic(postsig action) could be triggered due to this
  (maybe not, I haven't checked where is postsig() called from and if
  the user has a chance to change the sigmask, but I don't see the point
  of that panic there -- what do others think?)
- I completely forgot about compat_linux syscalls.  This new version
  fixes them as well (yes the code is duplicated, do we want to change
  that?) and then there was one useless splhigh().

 Also, is there any situation where reading the old value is not ok ?
 this would only happen if another cpu tries to check p_sigmask of
 something else than curproc.

It seems there isn't.
--
Martin Pelikan


Index: compat/linux/linux_signal.c
===
RCS file: /cvs/src/sys/compat/linux/linux_signal.c,v
retrieving revision 1.15
diff -u -p -r1.15 linux_signal.c
--- compat/linux/linux_signal.c 19 Jun 2012 11:35:29 -  1.15
+++ compat/linux/linux_signal.c 20 Jun 2013 10:12:40 -
@@ -584,7 +584,6 @@ linux_sys_sigprocmask(p, v, retval)
linux_old_sigset_t ss;
sigset_t bs;
int error = 0;
-   int s;
 
*retval = 0;
 
@@ -604,19 +603,18 @@ linux_sys_sigprocmask(p, v, retval)
 
linux_old_to_bsd_sigset(ss, bs);
 
-   s = splhigh();
-
+   bs = ~sigcantmask;
switch (SCARG(uap, how)) {
case LINUX_SIG_BLOCK:
-   p-p_sigmask |= bs  ~sigcantmask;
+   atomic_setbits_int(p-p_sigmask, bs);
break;
 
case LINUX_SIG_UNBLOCK:
-   p-p_sigmask = ~bs;
+   atomic_clearbits_int(p-p_sigmask, bs);
break;
 
case LINUX_SIG_SETMASK:
-   p-p_sigmask = bs  ~sigcantmask;
+   p-p_sigmask = bs;
break;
 
default:
@@ -624,8 +622,6 @@ linux_sys_sigprocmask(p, v, retval)
break;
}
 
-   splx(s);
-
return (error);
 }
 
@@ -644,7 +640,6 @@ linux_sys_rt_sigprocmask(p, v, retval)
linux_sigset_t ls;
sigset_t bs;
int error = 0;
-   int s;
 
if (SCARG(uap, sigsetsize) != sizeof(linux_sigset_t))
return (EINVAL);
@@ -667,19 +662,18 @@ linux_sys_rt_sigprocmask(p, v, retval)
 
linux_to_bsd_sigset(ls, bs);
 
-   s = splhigh();
-
+   bs = ~sigcantmask;
switch (SCARG(uap, how)) {
case LINUX_SIG_BLOCK:
-   p-p_sigmask |= bs  ~sigcantmask;
+   atomic_setbits_int(p-p_sigmask, bs);
break;
 
case LINUX_SIG_UNBLOCK:
-   p-p_sigmask = ~bs;
+   atomic_clearbits_int(p-p_sigmask, bs);
break;
 
case LINUX_SIG_SETMASK:
-   p-p_sigmask = bs  ~sigcantmask;
+   p-p_sigmask = bs;
break;
 
default:
@@ -687,8 +681,6 @@ linux_sys_rt_sigprocmask(p, v, retval)
break;
}
 
-   splx(s);
-
return (error);
 }
 
@@ -727,16 +719,13 @@ linux_sys_sigsetmask(p, v, retval)
} */ *uap = v;
linux_old_sigset_t mask;
sigset_t bsdsig;
-   int s;
 
bsd_to_linux_old_sigset(p-p_sigmask, (linux_old_sigset_t *)retval);
 
mask = SCARG(uap, mask);
bsd_to_linux_old_sigset(bsdsig, mask);
 
-   s = splhigh();
p-p_sigmask = bsdsig  ~sigcantmask;
-   splx(s);
 
return (0);
 }
Index: kern/init_sysent.c
===
RCS file: /cvs/src/sys/kern/init_sysent.c,v
retrieving revision 1.145
diff -u -p -r1.145 init_sysent.c
--- kern/init_sysent.c  9 Jun 2013 13:10:27 -   1.145
+++ kern/init_sysent.c  20 Jun 2013 10:12:41 -
@@ -136,7 +136,7 @@ struct sysent sysent[] = {
sys_sigaction },/* 46 = sigaction */
{ 0, 0, 0,
sys_getgid },   /* 47 = getgid */
-   { 2, s(struct sys_sigprocmask_args), 0,
+   { 2, s(struct sys_sigprocmask_args), SY_NOLOCK | 0,
sys_sigprocmask },  /* 48 = sigprocmask */
{ 2, s(struct sys_getlogin_args), 0,
sys_getlogin }, /* 49 = getlogin */
Index: kern/kern_sig.c
===
RCS file: /cvs/src/sys/kern/kern_sig.c,v
retrieving revision 1.152
diff -u -p -r1.152 kern_sig.c
--- kern/kern_sig.c 1 Jun 2013 

Re: help X11 performance: make sigprocmask(2) SY_NOLOCK

2013-06-20 Thread Paul Irofti
On Thu, Jun 20, 2013 at 12:35:32PM +0200, Martin Pelikan wrote:
  I think this is only changed on the context of curproc, so there will
  only be one dude modifying it at a time.
  Problem I see is if there is other code which reads p_sigmask twice,
  since it losed the atomicity between two reads, don't know what can
  happen.
 
 This was exactly my point.  Now that we've established aligned int writes
 are fine, there's the wrong assumption that two reads from that memory
 (even created by the compiler) must have to have the same value.
 
  On a quick look it doesn't seem to happen.
 
 Looking at the code more closely:
 
 - DIAGNOSTIC panic(postsig action) could be triggered due to this
   (maybe not, I haven't checked where is postsig() called from and if
   the user has a chance to change the sigmask, but I don't see the point
   of that panic there -- what do others think?)
 - I completely forgot about compat_linux syscalls.  This new version
   fixes them as well (yes the code is duplicated, do we want to change
   that?) and then there was one useless splhigh().
 
  Also, is there any situation where reading the old value is not ok ?
  this would only happen if another cpu tries to check p_sigmask of
  something else than curproc.
 
 It seems there isn't.
 --
 Martin Pelikan
 
 
 Index: compat/linux/linux_signal.c
 ===
 RCS file: /cvs/src/sys/compat/linux/linux_signal.c,v
 retrieving revision 1.15
 diff -u -p -r1.15 linux_signal.c
 --- compat/linux/linux_signal.c   19 Jun 2012 11:35:29 -  1.15
 +++ compat/linux/linux_signal.c   20 Jun 2013 10:12:40 -
 @@ -584,7 +584,6 @@ linux_sys_sigprocmask(p, v, retval)
   linux_old_sigset_t ss;
   sigset_t bs;
   int error = 0;
 - int s;
  
   *retval = 0;
  
 @@ -604,19 +603,18 @@ linux_sys_sigprocmask(p, v, retval)
  
   linux_old_to_bsd_sigset(ss, bs);
  
 - s = splhigh();
 -
 + bs = ~sigcantmask;
   switch (SCARG(uap, how)) {
   case LINUX_SIG_BLOCK:
 - p-p_sigmask |= bs  ~sigcantmask;
 + atomic_setbits_int(p-p_sigmask, bs);
   break;
  
   case LINUX_SIG_UNBLOCK:
 - p-p_sigmask = ~bs;
 + atomic_clearbits_int(p-p_sigmask, bs);

I'm pretty sure this is not correct with your change to bs before the
switch statement.

Same for the similar code below.

   break;
  
   case LINUX_SIG_SETMASK:
 - p-p_sigmask = bs  ~sigcantmask;
 + p-p_sigmask = bs;
   break;
  
   default:
 @@ -624,8 +622,6 @@ linux_sys_sigprocmask(p, v, retval)
   break;
   }
  
 - splx(s);
 -
   return (error);
  }
  
 @@ -644,7 +640,6 @@ linux_sys_rt_sigprocmask(p, v, retval)
   linux_sigset_t ls;
   sigset_t bs;
   int error = 0;
 - int s;
  
   if (SCARG(uap, sigsetsize) != sizeof(linux_sigset_t))
   return (EINVAL);
 @@ -667,19 +662,18 @@ linux_sys_rt_sigprocmask(p, v, retval)
  
   linux_to_bsd_sigset(ls, bs);
  
 - s = splhigh();
 -
 + bs = ~sigcantmask;
   switch (SCARG(uap, how)) {
   case LINUX_SIG_BLOCK:
 - p-p_sigmask |= bs  ~sigcantmask;
 + atomic_setbits_int(p-p_sigmask, bs);
   break;
  
   case LINUX_SIG_UNBLOCK:
 - p-p_sigmask = ~bs;
 + atomic_clearbits_int(p-p_sigmask, bs);
   break;
  
   case LINUX_SIG_SETMASK:
 - p-p_sigmask = bs  ~sigcantmask;
 + p-p_sigmask = bs;
   break;
  
   default:
 @@ -687,8 +681,6 @@ linux_sys_rt_sigprocmask(p, v, retval)
   break;
   }
  
 - splx(s);
 -
   return (error);
  }
  
 @@ -727,16 +719,13 @@ linux_sys_sigsetmask(p, v, retval)
   } */ *uap = v;
   linux_old_sigset_t mask;
   sigset_t bsdsig;
 - int s;
  
   bsd_to_linux_old_sigset(p-p_sigmask, (linux_old_sigset_t *)retval);
  
   mask = SCARG(uap, mask);
   bsd_to_linux_old_sigset(bsdsig, mask);
  
 - s = splhigh();
   p-p_sigmask = bsdsig  ~sigcantmask;
 - splx(s);
  
   return (0);
  }
 Index: kern/init_sysent.c
 ===
 RCS file: /cvs/src/sys/kern/init_sysent.c,v
 retrieving revision 1.145
 diff -u -p -r1.145 init_sysent.c
 --- kern/init_sysent.c9 Jun 2013 13:10:27 -   1.145
 +++ kern/init_sysent.c20 Jun 2013 10:12:41 -
 @@ -136,7 +136,7 @@ struct sysent sysent[] = {
   sys_sigaction },/* 46 = sigaction */
   { 0, 0, 0,
   sys_getgid },   /* 47 = getgid */
 - { 2, s(struct sys_sigprocmask_args), 0,
 + { 2, s(struct sys_sigprocmask_args), SY_NOLOCK | 0,
   sys_sigprocmask },  /* 48 = sigprocmask */
   { 2, s(struct sys_getlogin_args), 0,
   sys_getlogin }, /* 49 = getlogin */
 

Re: help X11 performance: make sigprocmask(2) SY_NOLOCK

2013-06-20 Thread Mark Kettenis
 Date: Thu, 20 Jun 2013 11:30:11 +0100
 From: Stuart Henderson st...@openbsd.org
 
 On 2013/06/20 09:38, Otto Moerbeek wrote:
  On Wed, Jun 19, 2013 at 07:39:15PM +, Miod Vallat wrote:
  
  -   p-p_sigmask = mask ~ sigcantmask;
  +   p-p_sigmask = mask;

On the right architecture where a word store isn't atomic enough and
with the right compiler that decides to put p_sigmask on an address
ending with 0xFFF with 4k-sized pages, we have two problems already.
   
   Holy Pumpkin forbid, struct proc layout is sane enough for fields to be
   aligned to their natural alignment, and the allocator will return
   properly aligned structs as well.
   
I'm only asking if such a situation can happen, or if there is some
ensure_this_assignment_is_always_atomic(p-p_sigmask, mask); function
that I missed.
   
   If such a function exists, it should be shot, if only for having a too
   long name.
   
   Miod
  
  But watch out, as kettenis@ already mentioned elsewhere, this can only
  be safe if *all* reads and modifications to the mask are done in an
  atomic way. 
  
  Straight read and assignment of int values is safe, afaik, but e.g.
  struct assignment and memcpy are *not* atomic. 

And p_sigmask is copied during fork, so that needs a bit of thought.
I guess it doesn't matter for the new child, as it isn't running yet
and therefore can't invoke sigprocmask(2).  And the parent should be
safe as well, as it is in fork(2) and therefore can't call
sigprocmask(2) either.

 I spotted a few places where p_sigmask is = or |='d, for example these
 from linux compat, might these be an issue?
 
 p-p_sigmask = ~bs;
 p-p_sigmask |= bs  ~sigcantmask;

There's also a couple of additional p-p_sigmask |= ... in kern_sig.c.

All of these need to become atomic_setbits_int/atomic_clearbits_int calls.



Re: help X11 performance: make sigprocmask(2) SY_NOLOCK

2013-06-20 Thread Mark Kettenis
 Date: Wed, 19 Jun 2013 13:40:19 +0200
 From: Martin Pelikan martin.peli...@gmail.com
 
  If you're right that atomic_{clear,set}bits_int is correct and
  sufficient and actually faster, then all dynamic executables would
  benefit from this speedup (sigprocmask is used in ld.so(1)).
 
 Since on i386 GENERIC these atomic_* things don't emit the LOCK prefix,
 performance shouldn't be an issue; I'm actually more worried about this bit:
 
   - p-p_sigmask = mask ~ sigcantmask;
   + p-p_sigmask = mask;
 
 On the right architecture where a word store isn't atomic enough and
 with the right compiler that decides to put p_sigmask on an address
 ending with 0xFFF with 4k-sized pages, we have two problems already.
 
 I'm only asking if such a situation can happen, or if there is some
 ensure_this_assignment_is_always_atomic(p-p_sigmask, mask); function
 that I missed.

There isn't one.  Unfortunately, we might need one to support SMP on
hppa, where we have the emulate atomic operations using a lock.  In
fact you need to worry about reading as well.  And since ptsignal()
looks at the p_sigmask of other threads in the process, it is not
immediately obvious there isn't a problem here.

I think the fact that only the thread itself can change its sigmask
means that there isn't an issue here.  But that probably means that
bothering with atomic_setbits_int/atomic_clearbits_int isn't necessary
in the first place.



Re: help X11 performance: make sigprocmask(2) SY_NOLOCK

2013-06-20 Thread Mark Kettenis
 Date: Thu, 20 Jun 2013 14:25:49 +0300
 From: Paul Irofti p...@irofti.net
 
  +   bs = ~sigcantmask;
  switch (SCARG(uap, how)) {
  case LINUX_SIG_BLOCK:
  -   p-p_sigmask |= bs  ~sigcantmask;
  +   atomic_setbits_int(p-p_sigmask, bs);
  break;
   
  case LINUX_SIG_UNBLOCK:
  -   p-p_sigmask = ~bs;
  +   atomic_clearbits_int(p-p_sigmask, bs);
 
 I'm pretty sure this is not correct with your change to bs before the
 switch statement.

No that's fine.  The bits in sigcantmask should never be set.  So

atomic_clearbits_int(p-p_sigmask, bs  ~sigcanmask);

is equivalent to

atomic_clearbits_int(p-p_sigmask, bs);



Re: help X11 performance: make sigprocmask(2) SY_NOLOCK

2013-06-20 Thread Paul Irofti
On Thu, Jun 20, 2013 at 02:01:26PM +0200, Mark Kettenis wrote:
  Date: Thu, 20 Jun 2013 14:25:49 +0300
  From: Paul Irofti p...@irofti.net
  
   + bs = ~sigcantmask;
 switch (SCARG(uap, how)) {
 case LINUX_SIG_BLOCK:
   - p-p_sigmask |= bs  ~sigcantmask;
   + atomic_setbits_int(p-p_sigmask, bs);
 break;

 case LINUX_SIG_UNBLOCK:
   - p-p_sigmask = ~bs;
   + atomic_clearbits_int(p-p_sigmask, bs);
  
  I'm pretty sure this is not correct with your change to bs before the
  switch statement.
 
 No that's fine.  The bits in sigcantmask should never be set.  So
 
 atomic_clearbits_int(p-p_sigmask, bs  ~sigcanmask);
 
 is equivalent to
 
 atomic_clearbits_int(p-p_sigmask, bs);

Right, nevermind. Carry on.



Re: help X11 performance: make sigprocmask(2) SY_NOLOCK

2013-06-20 Thread Martin Pelikan
  I'm only asking if such a situation can happen, or if there is some
  ensure_this_assignment_is_always_atomic(p-p_sigmask, mask); function
  that I missed.
 
 There isn't one.  Unfortunately, we might need one to support SMP on
 hppa, where we have the emulate atomic operations using a lock.  In
 fact you need to worry about reading as well.  And since ptsignal()
 looks at the p_sigmask of other threads in the process, it is not
 immediately obvious there isn't a problem here.

Does this mean that on a potentially SMP hppa just an unlocked aligned
word load can produce invalid data?  The HP's reference seems to be
'unavailable'...

 I think the fact that only the thread itself can change its sigmask
 means that there isn't an issue here.  But that probably means that
 bothering with atomic_setbits_int/atomic_clearbits_int isn't necessary
 in the first place.

Why was the splhigh() there in the first place?  I thought that on i386
and the like, AND to a memory location isn't atomic (without the LOCK
prefix) and therefore other thread reading the value might read garbage.
Whereas MOV is ok.  Obviously, sane architectures doing LD, AND and ST
are ok as well, as long as the ST is atomic (I don't have any experience
with hppa and random search on the internet suggested that pa-riscs
do have word stores atomically).
--
Martin Pelikan



Re: help X11 performance: make sigprocmask(2) SY_NOLOCK

2013-06-20 Thread Martin Pelikan
 And p_sigmask is copied during fork, so that needs a bit of thought.
 I guess it doesn't matter for the new child, as it isn't running yet
 and therefore can't invoke sigprocmask(2).  And the parent should be
 safe as well, as it is in fork(2) and therefore can't call
 sigprocmask(2) either.

memcpy(p_startcopy) which is, tadaaa, p_sigmask!  From implementations
of memcpy I've seen (usage of SSE or non-temporal instructions to begin
with) it seems that no guarantee about how big a chunk does it copy
atomically can be assumed...

So a multithreaded process doing sigprocmask(2) in one thread and
simultaneously fork(2) in the other?  Not a problem either, because
1003.1 2004 edition says that sigprocmask(2) is equivalent to
pthread_sigmask(3) and the fork only copies the other thread, not
bothering with the sigprocmask-ing one.
I haven't checked the code that thoroughly but this is what POSIX says.

 There's also a couple of additional p-p_sigmask |= ... in kern_sig.c.
 
 All of these need to become atomic_setbits_int/atomic_clearbits_int calls.

Does the third one do the charm?  I haven't compiled the other arch's,
but i386 is running fine.
--
Martin Pelikan


Index: arch/hp300/hp300/trap.c
===
RCS file: /cvs/src/sys/arch/hp300/hp300/trap.c,v
retrieving revision 1.62
diff -u -p -r1.62 trap.c
--- arch/hp300/hp300/trap.c 31 Dec 2012 06:46:13 -  1.62
+++ arch/hp300/hp300/trap.c 20 Jun 2013 12:51:46 -
@@ -321,7 +321,7 @@ dopanic:
i = sigmask(SIGILL);
p-p_sigacts-ps_sigignore = ~i;
p-p_sigacts-ps_sigcatch = ~i;
-   p-p_sigmask = ~i;
+   atomic_clearbits_int(p-p_sigmask, i);
i = SIGILL;
ucode = frame.f_format; /* XXX was ILL_RESAD_FAULT */
typ = ILL_COPROC;
Index: arch/mvme68k/mvme68k/trap.c
===
RCS file: /cvs/src/sys/arch/mvme68k/mvme68k/trap.c,v
retrieving revision 1.77
diff -u -p -r1.77 trap.c
--- arch/mvme68k/mvme68k/trap.c 31 Dec 2012 06:46:13 -  1.77
+++ arch/mvme68k/mvme68k/trap.c 20 Jun 2013 12:51:47 -
@@ -260,7 +260,7 @@ copyfault:
i = sigmask(SIGILL);
p-p_sigacts-ps_sigignore = ~i;
p-p_sigacts-ps_sigcatch = ~i;
-   p-p_sigmask = ~i;
+   atomic_clearbits_int(p-p_sigmask, i);
i = SIGILL;
ucode = frame.f_format; /* XXX was ILL_RESAD_FAULT */
typ = ILL_COPROC;
Index: compat/linux/linux_signal.c
===
RCS file: /cvs/src/sys/compat/linux/linux_signal.c,v
retrieving revision 1.15
diff -u -p -r1.15 linux_signal.c
--- compat/linux/linux_signal.c 19 Jun 2012 11:35:29 -  1.15
+++ compat/linux/linux_signal.c 20 Jun 2013 12:51:49 -
@@ -584,7 +584,6 @@ linux_sys_sigprocmask(p, v, retval)
linux_old_sigset_t ss;
sigset_t bs;
int error = 0;
-   int s;
 
*retval = 0;
 
@@ -604,19 +603,18 @@ linux_sys_sigprocmask(p, v, retval)
 
linux_old_to_bsd_sigset(ss, bs);
 
-   s = splhigh();
-
+   bs = ~sigcantmask;
switch (SCARG(uap, how)) {
case LINUX_SIG_BLOCK:
-   p-p_sigmask |= bs  ~sigcantmask;
+   atomic_setbits_int(p-p_sigmask, bs);
break;
 
case LINUX_SIG_UNBLOCK:
-   p-p_sigmask = ~bs;
+   atomic_clearbits_int(p-p_sigmask, bs);
break;
 
case LINUX_SIG_SETMASK:
-   p-p_sigmask = bs  ~sigcantmask;
+   p-p_sigmask = bs;
break;
 
default:
@@ -624,8 +622,6 @@ linux_sys_sigprocmask(p, v, retval)
break;
}
 
-   splx(s);
-
return (error);
 }
 
@@ -644,7 +640,6 @@ linux_sys_rt_sigprocmask(p, v, retval)
linux_sigset_t ls;
sigset_t bs;
int error = 0;
-   int s;
 
if (SCARG(uap, sigsetsize) != sizeof(linux_sigset_t))
return (EINVAL);
@@ -667,19 +662,18 @@ linux_sys_rt_sigprocmask(p, v, retval)
 
linux_to_bsd_sigset(ls, bs);
 
-   s = splhigh();
-
+   bs = ~sigcantmask;
switch (SCARG(uap, how)) {
case LINUX_SIG_BLOCK:
-   p-p_sigmask |= bs  ~sigcantmask;
+   atomic_setbits_int(p-p_sigmask, bs);
break;
 
case LINUX_SIG_UNBLOCK:
-   p-p_sigmask = ~bs;
+   atomic_clearbits_int(p-p_sigmask, bs);
break;
 
case LINUX_SIG_SETMASK:
-   p-p_sigmask = bs  ~sigcantmask;
+   p-p_sigmask = bs;
break;
 
default:
@@ -687,8 +681,6 @@ linux_sys_rt_sigprocmask(p, v, retval)
break;
}
 
-   splx(s);
-
return (error);
 }
 
@@ -727,16 +719,13 @@ linux_sys_sigsetmask(p, v, retval)

help X11 performance: make sigprocmask(2) SY_NOLOCK

2013-06-19 Thread Martin Pelikan
Hi!

Recently, I had to unleash my crappy Atom and decided to run OpenBSD on it.
Then I became irritated with the performance of most X11 applications, and
started poking around:
 - libfreetype.so is quite common
 - it does setjmp (disguised as ft_setjmp) quite a lot
 - setjmp needs to do sigprocmask
 - sigprocmask (I presume) needs the kernel lock.

sigprocmask(2) is quite simple actually, but wasn't marked as NOLOCK,
so I made it so.  And then I made a pointless test to demonstrate my point,
and ran it on my pointless Atom N270 (two threads, one core from 1980s)
as well as in VMware, which is the only 4-core amd64 machine I can get
on right now.  (did I say anything about using virtual machines to measure
things?)

#include setjmp.h

int
main(void)
{
jmp_buf env;
unsigned i;

/* on amd64 this was too slow, so 20M was used instead */
for (i = 0; i  400; ++i) {
setjmp(env);
}
return (0);
}

Before:
Atom, i386, single process, 0m3.80s real 0m1.15s user 0m2.50s system
Atom, i386, two processes,  0m4.81s real 0m1.44s user 0m3.09s system

amd64, single process, i20M,   0m3.35s real 0m0.65s user 0m2.23s system
amd64, two processes, i20M,0m4.48s real 0m0.66s user 0m3.37s system

After:
Atom, i386, single process, 0m2.93s real 0m0.95s user 0m1.35s system
Atom, i386, two processes,  0m3.67s real 0m1.36s user 0m2.15s system

amd64, single process, i20M,   0m2.14s real 0m0.63s user 0m1.12s system
amd64, two processes, i20M,0m2.57s real 0m0.57s user 0m1.55s system


If anyone has a real test with some real-world application using lots
of setjmps, I'd like to see it.  What'd I like to see most, is if does
it work on other architectures and if it is indeed correct.  Although
the numbers suggest quite an improvement, bear in mind this kind of
load does not at all resemble any software actually doing useful stuff.

Any comments?
--
Martin Pelikan


Index: kern/init_sysent.c
===
RCS file: /cvs/src/sys/kern/init_sysent.c,v
retrieving revision 1.145
diff -u -p -r1.145 init_sysent.c
--- kern/init_sysent.c  9 Jun 2013 13:10:27 -   1.145
+++ kern/init_sysent.c  19 Jun 2013 09:49:57 -
@@ -136,7 +136,7 @@ struct sysent sysent[] = {
sys_sigaction },/* 46 = sigaction */
{ 0, 0, 0,
sys_getgid },   /* 47 = getgid */
-   { 2, s(struct sys_sigprocmask_args), 0,
+   { 2, s(struct sys_sigprocmask_args), SY_NOLOCK | 0,
sys_sigprocmask },  /* 48 = sigprocmask */
{ 2, s(struct sys_getlogin_args), 0,
sys_getlogin }, /* 49 = getlogin */
Index: kern/kern_sig.c
===
RCS file: /cvs/src/sys/kern/kern_sig.c,v
retrieving revision 1.152
diff -u -p -r1.152 kern_sig.c
--- kern/kern_sig.c 1 Jun 2013 04:05:26 -   1.152
+++ kern/kern_sig.c 19 Jun 2013 09:49:57 -
@@ -429,28 +429,25 @@ sys_sigprocmask(struct proc *p, void *v,
syscallarg(sigset_t) mask;
} */ *uap = v;
int error = 0;
-   int s;
sigset_t mask;
 
*retval = p-p_sigmask;
-   mask = SCARG(uap, mask);
-   s = splhigh();
+   mask = SCARG(uap, mask) ~ sigcantmask;
 
switch (SCARG(uap, how)) {
case SIG_BLOCK:
-   p-p_sigmask |= mask ~ sigcantmask;
+   atomic_setbits_int(p-p_sigmask, mask);
break;
case SIG_UNBLOCK:
-   p-p_sigmask = ~mask;
+   atomic_clearbits_int(p-p_sigmask, mask);
break;
case SIG_SETMASK:
-   p-p_sigmask = mask ~ sigcantmask;
+   p-p_sigmask = mask;
break;
default:
error = EINVAL;
break;
}
-   splx(s);
return (error);
 }
 
Index: kern/syscalls.master
===
RCS file: /cvs/src/sys/kern/syscalls.master,v
retrieving revision 1.133
diff -u -p -r1.133 syscalls.master
--- kern/syscalls.master9 Jun 2013 13:10:19 -   1.133
+++ kern/syscalls.master19 Jun 2013 09:49:57 -
@@ -123,7 +123,7 @@
const struct sigaction *nsa, \
struct sigaction *osa); }
 47 STD { gid_t sys_getgid(void); }
-48 STD { int sys_sigprocmask(int how, sigset_t mask); }
+48 STD NOLOCK  { int sys_sigprocmask(int how, sigset_t mask); }
 49 STD { int sys_getlogin(char *namebuf, u_int namelen); }
 50 STD { int sys_setlogin(const char *namebuf); }
 #ifdef ACCOUNTING



Re: help X11 performance: make sigprocmask(2) SY_NOLOCK

2013-06-19 Thread Jérémie Courrèges-Anglas
Martin Pelikan martin.peli...@gmail.com writes:

 Hi!

 Recently, I had to unleash my crappy Atom and decided to run OpenBSD on it.
 Then I became irritated with the performance of most X11 applications, and
 started poking around:
  - libfreetype.so is quite common
  - it does setjmp (disguised as ft_setjmp) quite a lot
  - setjmp needs to do sigprocmask
  - sigprocmask (I presume) needs the kernel lock.

If you're right that atomic_{clear,set}bits_int is correct and
sufficient and actually faster, then all dynamic executables would
benefit from this speedup (sigprocmask is used in ld.so(1)).

 sigprocmask(2) is quite simple actually, but wasn't marked as NOLOCK,
 so I made it so.  And then I made a pointless test to demonstrate my point,
 and ran it on my pointless Atom N270 (two threads, one core from 1980s)
 as well as in VMware, which is the only 4-core amd64 machine I can get
 on right now.  (did I say anything about using virtual machines to measure
 things?)

 #include setjmp.h

 int
 main(void)
 {
 jmp_buf env;
 unsigned i;

   /* on amd64 this was too slow, so 20M was used instead */
 for (i = 0; i  400; ++i) {
 setjmp(env);
 }
 return (0);
 }

 Before:
 Atom, i386, single process,   0m3.80s real 0m1.15s user 0m2.50s system
 Atom, i386, two processes,0m4.81s real 0m1.44s user 0m3.09s system

 amd64, single process, i20M, 0m3.35s real 0m0.65s user 0m2.23s system
 amd64, two processes, i20M,  0m4.48s real 0m0.66s user 0m3.37s system

 After:
 Atom, i386, single process,   0m2.93s real 0m0.95s user 0m1.35s system
 Atom, i386, two processes,0m3.67s real 0m1.36s user 0m2.15s system

 amd64, single process, i20M, 0m2.14s real 0m0.63s user 0m1.12s system
 amd64, two processes, i20M,  0m2.57s real 0m0.57s user 0m1.55s system


 If anyone has a real test with some real-world application using lots
 of setjmps, I'd like to see it.  What'd I like to see most, is if does
 it work on other architectures and if it is indeed correct.  Although
 the numbers suggest quite an improvement, bear in mind this kind of
 load does not at all resemble any software actually doing useful stuff.

 Any comments?
 --
 Martin Pelikan


 Index: kern/init_sysent.c
 ===
 RCS file: /cvs/src/sys/kern/init_sysent.c,v
 retrieving revision 1.145
 diff -u -p -r1.145 init_sysent.c
 --- kern/init_sysent.c9 Jun 2013 13:10:27 -   1.145
 +++ kern/init_sysent.c19 Jun 2013 09:49:57 -
 @@ -136,7 +136,7 @@ struct sysent sysent[] = {
   sys_sigaction },/* 46 = sigaction */
   { 0, 0, 0,
   sys_getgid },   /* 47 = getgid */
 - { 2, s(struct sys_sigprocmask_args), 0,
 + { 2, s(struct sys_sigprocmask_args), SY_NOLOCK | 0,
   sys_sigprocmask },  /* 48 = sigprocmask */
   { 2, s(struct sys_getlogin_args), 0,
   sys_getlogin }, /* 49 = getlogin */
 Index: kern/kern_sig.c
 ===
 RCS file: /cvs/src/sys/kern/kern_sig.c,v
 retrieving revision 1.152
 diff -u -p -r1.152 kern_sig.c
 --- kern/kern_sig.c   1 Jun 2013 04:05:26 -   1.152
 +++ kern/kern_sig.c   19 Jun 2013 09:49:57 -
 @@ -429,28 +429,25 @@ sys_sigprocmask(struct proc *p, void *v,
   syscallarg(sigset_t) mask;
   } */ *uap = v;
   int error = 0;
 - int s;
   sigset_t mask;
  
   *retval = p-p_sigmask;
 - mask = SCARG(uap, mask);
 - s = splhigh();
 + mask = SCARG(uap, mask) ~ sigcantmask;
  
   switch (SCARG(uap, how)) {
   case SIG_BLOCK:
 - p-p_sigmask |= mask ~ sigcantmask;
 + atomic_setbits_int(p-p_sigmask, mask);
   break;
   case SIG_UNBLOCK:
 - p-p_sigmask = ~mask;
 + atomic_clearbits_int(p-p_sigmask, mask);
   break;
   case SIG_SETMASK:
 - p-p_sigmask = mask ~ sigcantmask;
 + p-p_sigmask = mask;
   break;
   default:
   error = EINVAL;
   break;
   }
 - splx(s);
   return (error);
  }
  
 Index: kern/syscalls.master
 ===
 RCS file: /cvs/src/sys/kern/syscalls.master,v
 retrieving revision 1.133
 diff -u -p -r1.133 syscalls.master
 --- kern/syscalls.master  9 Jun 2013 13:10:19 -   1.133
 +++ kern/syscalls.master  19 Jun 2013 09:49:57 -
 @@ -123,7 +123,7 @@
   const struct sigaction *nsa, \
   struct sigaction *osa); }
  47   STD { gid_t sys_getgid(void); }
 -48   STD { int sys_sigprocmask(int how, sigset_t mask); }
 +48   STD NOLOCK  { int sys_sigprocmask(int how, sigset_t mask); }
  49   STD { int 

Re: help X11 performance: make sigprocmask(2) SY_NOLOCK

2013-06-19 Thread Martin Pelikan
 If you're right that atomic_{clear,set}bits_int is correct and
 sufficient and actually faster, then all dynamic executables would
 benefit from this speedup (sigprocmask is used in ld.so(1)).

Since on i386 GENERIC these atomic_* things don't emit the LOCK prefix,
performance shouldn't be an issue; I'm actually more worried about this bit:

  -   p-p_sigmask = mask ~ sigcantmask;
  +   p-p_sigmask = mask;

On the right architecture where a word store isn't atomic enough and
with the right compiler that decides to put p_sigmask on an address
ending with 0xFFF with 4k-sized pages, we have two problems already.

I'm only asking if such a situation can happen, or if there is some
ensure_this_assignment_is_always_atomic(p-p_sigmask, mask); function
that I missed.
--
Martin Pelikan



Re: help X11 performance: make sigprocmask(2) SY_NOLOCK

2013-06-19 Thread Marc Espie
On Wed, Jun 19, 2013 at 01:40:19PM +0200, Martin Pelikan wrote:
  If you're right that atomic_{clear,set}bits_int is correct and
  sufficient and actually faster, then all dynamic executables would
  benefit from this speedup (sigprocmask is used in ld.so(1)).
 
 Since on i386 GENERIC these atomic_* things don't emit the LOCK prefix,
 performance shouldn't be an issue; I'm actually more worried about this bit:
 
   - p-p_sigmask = mask ~ sigcantmask;
   + p-p_sigmask = mask;
 
 On the right architecture where a word store isn't atomic enough and
 with the right compiler that decides to put p_sigmask on an address
 ending with 0xFFF with 4k-sized pages, we have two problems already.

If I read the code correctly, just adding an alignment constraint
to pool_init for the proc_pool would fix that 2nd part.



Re: help X11 performance: make sigprocmask(2) SY_NOLOCK

2013-06-19 Thread Ted Unangst
On Wed, Jun 19, 2013 at 14:19, Marc Espie wrote:
 On Wed, Jun 19, 2013 at 01:40:19PM +0200, Martin Pelikan wrote:
  If you're right that atomic_{clear,set}bits_int is correct and
  sufficient and actually faster, then all dynamic executables would
  benefit from this speedup (sigprocmask is used in ld.so(1)).

 Since on i386 GENERIC these atomic_* things don't emit the LOCK prefix,
 performance shouldn't be an issue; I'm actually more worried about this
 bit:

   -p-p_sigmask = mask ~ sigcantmask;
   +p-p_sigmask = mask;

 On the right architecture where a word store isn't atomic enough and
 with the right compiler that decides to put p_sigmask on an address
 ending with 0xFFF with 4k-sized pages, we have two problems already.
 
 If I read the code correctly, just adding an alignment constraint
 to pool_init for the proc_pool would fix that 2nd part.

pool will never give you an object that spans pages (unless the object
itself is larger than a page). p_sigmask will always be aligned
nicely. i mean, if it weren't, that code already would crash.



Re: help X11 performance: make sigprocmask(2) SY_NOLOCK

2013-06-19 Thread Miod Vallat
   - p-p_sigmask = mask ~ sigcantmask;
   + p-p_sigmask = mask;
 
 On the right architecture where a word store isn't atomic enough and
 with the right compiler that decides to put p_sigmask on an address
 ending with 0xFFF with 4k-sized pages, we have two problems already.

Holy Pumpkin forbid, struct proc layout is sane enough for fields to be
aligned to their natural alignment, and the allocator will return
properly aligned structs as well.

 I'm only asking if such a situation can happen, or if there is some
 ensure_this_assignment_is_always_atomic(p-p_sigmask, mask); function
 that I missed.

If such a function exists, it should be shot, if only for having a too
long name.

Miod