Re: 1023rd thread crashes 2.4.0-test8 from non-root user (fwd)

2000-09-26 Thread Ted Deppner

On Mon, Sep 25, 2000 at 03:02:05PM -0700, Linus Torvalds wrote:
>   sigdelset(>signal, sig);

I just tested this using my perl-5.005-threads program... no change from
my last email (only 1023 threads created, program fails to respond to
ctrl-c when more than 1023 threads are attempted).  This _appears_ to be a
bug in perl-5.005-threads as shipped with debian potato.

Using Mark Hahn's test code, I get all 2000 threads successfully created,
and they respond properly when killed via ctrl-c.  So that appears to fix
the problem.

ASSUMING the perl-5.005-thread problem is indeed a perl problem I think
this solves the kernel crash problem.  (NOTE, I have test this with
max_queued_signal at 4096 and 1024... no difference for either perl or
Mark's code.)

I'll get the source to perl-5.005-thread and play with it later tonight.

-- 
Ted Deppner
http://www.psyber.com/~ted/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 1023rd thread crashes 2.4.0-test8 from non-root user (fwd)

2000-09-26 Thread Ted Deppner

On Mon, Sep 25, 2000 at 03:02:05PM -0700, Linus Torvalds wrote:
   sigdelset(list-signal, sig);

I just tested this using my perl-5.005-threads program... no change from
my last email (only 1023 threads created, program fails to respond to
ctrl-c when more than 1023 threads are attempted).  This _appears_ to be a
bug in perl-5.005-threads as shipped with debian potato.

Using Mark Hahn's test code, I get all 2000 threads successfully created,
and they respond properly when killed via ctrl-c.  So that appears to fix
the problem.

ASSUMING the perl-5.005-thread problem is indeed a perl problem I think
this solves the kernel crash problem.  (NOTE, I have test this with
max_queued_signal at 4096 and 1024... no difference for either perl or
Mark's code.)

I'll get the source to perl-5.005-thread and play with it later tonight.

-- 
Ted Deppner
http://www.psyber.com/~ted/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 1023rd thread crashes 2.4.0-test8 from non-root user

2000-09-25 Thread Ted Deppner

On Mon, Sep 25, 2000 at 10:33:06AM +0200, Ingo Molnar wrote:
> On Mon, 25 Sep 2000, Ted Deppner wrote:
> 
> > I ask because on my perl-threads test case, I can't create more than 1023
> > threads, but I get a kernel crash when I've _attempted_ to create more
> > than 1023 and hit ctrl-c.
> 
> could you test this with the kernel/signal.c:max_queued_signals
> initialization change i suggested? Does it still crash?

With max_queued_signals=4096, I can still only create 1022 threads under
perl-5.005-threads.  

With more than 1023 threads the process no longer responds to ctrl-c, or a
kill -INT on it.  A kill -9 will kill it however with no kernel lockup.

Under 1023 threads the process responds to ctrl-c.

It seems like the bug is definately involved in signal handling, and that
max_queued_signals affects it in some way...


My ulimit -a from bash... you can see open files at 1024, but I'm not
doing open files stuff in my test program (threadcrash.pl).

core file size (blocks) 0
data seg size (kbytes)  unlimited
file size (blocks)  unlimited
max locked memory (kbytes)  unlimited
max memory size (kbytes)unlimited
open files  1024
pipe size (512 bytes)   8
stack size (kbytes) 8192
cpu time (seconds)  unlimited
max user processes  4093
virtual memory (kbytes) unlimited

I upped my open-files to 2048 and still was unable to get more than 1022
threads running.  I wonder if perl-5.005-threads might have a static limit
set somewhere inside it.  Maybe I'll try to recompile it tonight and see
what happens.

-- 
Ted Deppner
http://www.psyber.com/~ted/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 1023rd thread crashes 2.4.0-test8 from non-root user (fwd)

2000-09-25 Thread Linus Torvalds


Duh. This was a really stupid bug.

In kernel/signal.c, collect_signal(), for the case where we don't find a
siginfo block, we need to clear the signal set.

In short, add the line

sigdelset(>signal, sig);

just before the first "return 1" in collect_signal(), and all should be
well (famous last words - it's untested, but I'm sure that's it).

If I'm right, the kernel didn't properly crash, but it would send the
signal on and on again forever, which would basically kill the machine if
something like init or X or a number of other important cases got stuck
doing nothing.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 1023rd thread crashes 2.4.0-test8 from non-root user

2000-09-25 Thread Ingo Molnar


btw., maybe it's init that gets those 2000 signals, not bash?

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 1023rd thread crashes 2.4.0-test8 from non-root user

2000-09-25 Thread Ingo Molnar


indeed, after changing max_queued_signals to 4096, i cannot crash the
kernel anymore with 2000 threads.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 1023rd thread crashes 2.4.0-test8 from non-root user

2000-09-25 Thread Ingo Molnar


On Mon, 25 Sep 2000, Mark Hahn wrote:

> > The problem is large numbers of threads in 2.4.0-test8 can result in a
> > hard crash of the entire kernel.  This can be done as a non-root user.
> 
> this appears to be reproducable (128M duron, haven't tried intel UP/SMP):

i've done some experimentation, and to me it appears we overload the
queued signal limit of bash, or something like that? The Ctrl-C thing
definitely creates alot of signals. And the default limit for queued
signals [kernel/signal.c:max_queued_signals] is 1024 ...

so i think this is threading-unrelated, to me it (tentatively) looks like
to be a signal handling bug.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 1023rd thread crashes 2.4.0-test8 from non-root user

2000-09-25 Thread Ingo Molnar


On Mon, 25 Sep 2000, Mark Hahn wrote:

  The problem is large numbers of threads in 2.4.0-test8 can result in a
  hard crash of the entire kernel.  This can be done as a non-root user.
 
 this appears to be reproducable (128M duron, haven't tried intel UP/SMP):

i've done some experimentation, and to me it appears we overload the
queued signal limit of bash, or something like that? The Ctrl-C thing
definitely creates alot of signals. And the default limit for queued
signals [kernel/signal.c:max_queued_signals] is 1024 ...

so i think this is threading-unrelated, to me it (tentatively) looks like
to be a signal handling bug.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 1023rd thread crashes 2.4.0-test8 from non-root user

2000-09-25 Thread Ingo Molnar


indeed, after changing max_queued_signals to 4096, i cannot crash the
kernel anymore with 2000 threads.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 1023rd thread crashes 2.4.0-test8 from non-root user

2000-09-25 Thread Ingo Molnar


btw., maybe it's init that gets those 2000 signals, not bash?

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 1023rd thread crashes 2.4.0-test8 from non-root user (fwd)

2000-09-25 Thread Linus Torvalds


Duh. This was a really stupid bug.

In kernel/signal.c, collect_signal(), for the case where we don't find a
siginfo block, we need to clear the signal set.

In short, add the line

sigdelset(list-signal, sig);

just before the first "return 1" in collect_signal(), and all should be
well (famous last words - it's untested, but I'm sure that's it).

If I'm right, the kernel didn't properly crash, but it would send the
signal on and on again forever, which would basically kill the machine if
something like init or X or a number of other important cases got stuck
doing nothing.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 1023rd thread crashes 2.4.0-test8 from non-root user

2000-09-25 Thread Ted Deppner

On Mon, Sep 25, 2000 at 10:33:06AM +0200, Ingo Molnar wrote:
 On Mon, 25 Sep 2000, Ted Deppner wrote:
 
  I ask because on my perl-threads test case, I can't create more than 1023
  threads, but I get a kernel crash when I've _attempted_ to create more
  than 1023 and hit ctrl-c.
 
 could you test this with the kernel/signal.c:max_queued_signals
 initialization change i suggested? Does it still crash?

With max_queued_signals=4096, I can still only create 1022 threads under
perl-5.005-threads.  

With more than 1023 threads the process no longer responds to ctrl-c, or a
kill -INT on it.  A kill -9 will kill it however with no kernel lockup.

Under 1023 threads the process responds to ctrl-c.

It seems like the bug is definately involved in signal handling, and that
max_queued_signals affects it in some way...


My ulimit -a from bash... you can see open files at 1024, but I'm not
doing open files stuff in my test program (threadcrash.pl).

core file size (blocks) 0
data seg size (kbytes)  unlimited
file size (blocks)  unlimited
max locked memory (kbytes)  unlimited
max memory size (kbytes)unlimited
open files  1024
pipe size (512 bytes)   8
stack size (kbytes) 8192
cpu time (seconds)  unlimited
max user processes  4093
virtual memory (kbytes) unlimited

I upped my open-files to 2048 and still was unable to get more than 1022
threads running.  I wonder if perl-5.005-threads might have a static limit
set somewhere inside it.  Maybe I'll try to recompile it tonight and see
what happens.

-- 
Ted Deppner
http://www.psyber.com/~ted/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 1023rd thread crashes 2.4.0-test8 from non-root user

2000-09-24 Thread Mark Hahn

> The problem is large numbers of threads in 2.4.0-test8 can result in a
> hard crash of the entire kernel.  This can be done as a non-root user.

this appears to be reproducable (128M duron, haven't tried intel UP/SMP):

// code derived from a clone demo in lmbench.
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

int 
do_clone(void (*fn)(void *), void *data, char *stack) {
long retval;

*--(void**)stack = data;

__asm__ __volatile__(
"int $0x80\n\t" /* Linux/i386 system call */
"testl %0,%0\n\t"   /* check return value */
"jne 1f\n\t"/* jump if parent */
"call *%3\n\t"  /* start subthread function */
"movl %2,%0\n\t"
"int $0x80\n"   /* exit system call: exit subthread */
"1:\t"
:"=a" (retval)
:"0" (__NR_clone),"i" (__NR_exit),
"r" (fn),
"b" (CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGHAND | SIGCHLD),
"c" (stack));

if (retval < 0) {
errno = -retval;
retval = -1;
}
return retval;
}

atomic_t counter = ATOMIC_INIT(0);
atomic_t die = ATOMIC_INIT(0);

void
kid(void *data) {
atomic_inc();
while (!atomic_read())
sleep(1);
exit(0);
}

double 
gtod() {
struct timeval tv;
gettimeofday(,0);
return tv.tv_sec + 1e-6 * tv.tv_usec;
}

int 
main() {
const unsigned n = 2000;
const int stackPerThread = 4096;
char stack[n * stackPerThread];
char *stacktop = stack + sizeof(stack) - 1;

double before = gtod();
for (unsigned i=0; ihttp://www.tux.org/lkml/



Re: 1023rd thread crashes 2.4.0-test8 from non-root user

2000-09-24 Thread Mark Hahn

 The problem is large numbers of threads in 2.4.0-test8 can result in a
 hard crash of the entire kernel.  This can be done as a non-root user.

this appears to be reproducable (128M duron, haven't tried intel UP/SMP):

// code derived from a clone demo in lmbench.
#include signal.h
#include stdio.h
#include unistd.h
#include stdlib.h
#include sys/user.h
#include sys/wait.h
#include sched.h
#include syscall.h
#include errno.h
#include sys/time.h
#include asm/atomic.h

int 
do_clone(void (*fn)(void *), void *data, char *stack) {
long retval;

*--(void**)stack = data;

__asm__ __volatile__(
"int $0x80\n\t" /* Linux/i386 system call */
"testl %0,%0\n\t"   /* check return value */
"jne 1f\n\t"/* jump if parent */
"call *%3\n\t"  /* start subthread function */
"movl %2,%0\n\t"
"int $0x80\n"   /* exit system call: exit subthread */
"1:\t"
:"=a" (retval)
:"0" (__NR_clone),"i" (__NR_exit),
"r" (fn),
"b" (CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGHAND | SIGCHLD),
"c" (stack));

if (retval  0) {
errno = -retval;
retval = -1;
}
return retval;
}

atomic_t counter = ATOMIC_INIT(0);
atomic_t die = ATOMIC_INIT(0);

void
kid(void *data) {
atomic_inc(counter);
while (!atomic_read(die))
sleep(1);
exit(0);
}

double 
gtod() {
struct timeval tv;
gettimeofday(tv,0);
return tv.tv_sec + 1e-6 * tv.tv_usec;
}

int 
main() {
const unsigned n = 2000;
const int stackPerThread = 4096;
char stack[n * stackPerThread];
char *stacktop = stack + sizeof(stack) - 1;

double before = gtod();
for (unsigned i=0; in; i++) {
if (do_clone(kid, (void*) "hey", stacktop)  0) {
perror("clone");
exit(1);
}
stacktop -= 4096;
}
double elapsed = gtod() - before;
printf("OK, created %d threads in %f seconds (%f/second)\n", 
   n, elapsed, n/elapsed);
printf("hit any key to tell them all to die..."); fflush(stdout);
getchar();
atomic_set(die,1);
for (int c=0; catomic_read(counter); c++)
wait(0);
printf("OK, all dead\n");
return 0;
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/