Re: crashme fault

2007-09-17 Thread Randy Dunlap
On Mon, 17 Sep 2007 07:53:50 -0700 (PDT) Linus Torvalds wrote: > On Mon, 17 Sep 2007, Randy Dunlap wrote: > > > > OK, I haven't done the microcode update yet. I ran crashme overnight > > with your newer patch and it crashed: > > Well, duh. > > That's because I forgot to do the "error_code &

Re: crashme fault

2007-09-17 Thread Linus Torvalds
On Mon, 17 Sep 2007, Randy Dunlap wrote: > > OK, I haven't done the microcode update yet. I ran crashme overnight > with your newer patch and it crashed: Well, duh. That's because I forgot to do the "error_code & PF_USER" => "user_mode_vm(regs)" thing in the most common case - the

Re: crashme fault

2007-09-17 Thread Randy Dunlap
Linus Torvalds wrote: On Sun, 16 Sep 2007, Randy Dunlap wrote: I'll test this overnight on 2.6.23-rc6-git2 since that was failing. I haven't been able to reproduce the fault on 2.6.21 after several hours of testing. I'll also test a microcode update to see if it helps. Before you do the

Re: crashme fault

2007-09-17 Thread Randy Dunlap
Linus Torvalds wrote: On Sun, 16 Sep 2007, Randy Dunlap wrote: I'll test this overnight on 2.6.23-rc6-git2 since that was failing. I haven't been able to reproduce the fault on 2.6.21 after several hours of testing. I'll also test a microcode update to see if it helps. Before you do the

Re: crashme fault

2007-09-17 Thread Linus Torvalds
On Mon, 17 Sep 2007, Randy Dunlap wrote: OK, I haven't done the microcode update yet. I ran crashme overnight with your newer patch and it crashed: Well, duh. That's because I forgot to do the error_code PF_USER = user_mode_vm(regs) thing in the most common case - the

Re: crashme fault

2007-09-17 Thread Randy Dunlap
On Mon, 17 Sep 2007 07:53:50 -0700 (PDT) Linus Torvalds wrote: On Mon, 17 Sep 2007, Randy Dunlap wrote: OK, I haven't done the microcode update yet. I ran crashme overnight with your newer patch and it crashed: Well, duh. That's because I forgot to do the error_code PF_USER =

Re: crashme fault

2007-09-16 Thread Linus Torvalds
On Sun, 16 Sep 2007, Randy Dunlap wrote: > > I'll test this overnight on 2.6.23-rc6-git2 since that was failing. > > I haven't been able to reproduce the fault on 2.6.21 after several > hours of testing. > > I'll also test a microcode update to see if it helps. Before you do the microcode

Re: crashme fault

2007-09-16 Thread Randy Dunlap
On Sun, 16 Sep 2007 11:12:23 -0700 (PDT) Linus Torvalds wrote: > > > On Sun, 16 Sep 2007, Linus Torvalds wrote: > > > > I'm really starting to suspect some early EM64T bug, and I also suspect > > that it's harmless but that we should just do the trivial patch to say "if > > the register

Re: crashme fault

2007-09-16 Thread Andi Kleen
On Sun, Sep 16, 2007 at 10:14:46AM -0700, Linus Torvalds wrote: > > > On Sun, 16 Sep 2007, Randy Dunlap wrote: > > > > I'll apply this patch today, but I haven't done so yet (for the 2 > > bug reports below). > > Actually, it's probably better that you don't change your situation >

Re: crashme fault

2007-09-16 Thread Linus Torvalds
On Sun, 16 Sep 2007, Linus Torvalds wrote: > > I'm really starting to suspect some early EM64T bug, and I also suspect > that it's harmless but that we should just do the trivial patch to say "if > the register state is in user mode, we don't care if the CPU says it was a > kernel access".

Re: crashme fault

2007-09-16 Thread Linus Torvalds
On Sun, 16 Sep 2007, Randy Dunlap wrote: > > I'll apply this patch today, but I haven't done so yet (for the 2 > bug reports below). Actually, it's probably better that you don't change your situation unnecessarily, in case the bug goes away. Since you are triggering the problem even

Re: crashme fault

2007-09-16 Thread Randy Dunlap
On Sat, 15 Sep 2007 17:34:54 -0700 (PDT) Linus Torvalds wrote: > > > On Sat, 15 Sep 2007, Randy Dunlap wrote: > > Command: ./crashme +2000 666 1000 1:00:00 1 > > Ok, that's close to what I was testing (one of the examples from the > crashme docs). > > > > The original gjc crashme doesn't

Re: crashme fault

2007-09-16 Thread Randy Dunlap
On Sun, 16 Sep 2007 17:53:21 +0200 Andrea Arcangeli wrote: > On Wed, Sep 12, 2007 at 10:21:51PM -0700, Randy Dunlap wrote: > > I run almost-daily kernel testing. I haven't seen 'crashme' cause a > > kernel fault until today, and now I've seen it twice on 2.6.23-rc6-git2, > > Did the room

Re: crashme fault

2007-09-16 Thread Andrea Arcangeli
On Wed, Sep 12, 2007 at 10:21:51PM -0700, Randy Dunlap wrote: > I run almost-daily kernel testing. I haven't seen 'crashme' cause a > kernel fault until today, and now I've seen it twice on 2.6.23-rc6-git2, Did the room temperature change in the server room? ;) Those early EM64T P4 core based

Re: crashme fault

2007-09-16 Thread Andrea Arcangeli
On Wed, Sep 12, 2007 at 10:21:51PM -0700, Randy Dunlap wrote: I run almost-daily kernel testing. I haven't seen 'crashme' cause a kernel fault until today, and now I've seen it twice on 2.6.23-rc6-git2, Did the room temperature change in the server room? ;) Those early EM64T P4 core based are

Re: crashme fault

2007-09-16 Thread Randy Dunlap
On Sun, 16 Sep 2007 17:53:21 +0200 Andrea Arcangeli wrote: On Wed, Sep 12, 2007 at 10:21:51PM -0700, Randy Dunlap wrote: I run almost-daily kernel testing. I haven't seen 'crashme' cause a kernel fault until today, and now I've seen it twice on 2.6.23-rc6-git2, Did the room temperature

Re: crashme fault

2007-09-16 Thread Randy Dunlap
On Sat, 15 Sep 2007 17:34:54 -0700 (PDT) Linus Torvalds wrote: On Sat, 15 Sep 2007, Randy Dunlap wrote: Command: ./crashme +2000 666 1000 1:00:00 1 Ok, that's close to what I was testing (one of the examples from the crashme docs). The original gjc crashme doesn't even do a

Re: crashme fault

2007-09-16 Thread Linus Torvalds
On Sun, 16 Sep 2007, Randy Dunlap wrote: I'll apply this patch today, but I haven't done so yet (for the 2 bug reports below). Actually, it's probably better that you don't change your situation unnecessarily, in case the bug goes away. Since you are triggering the problem even *without*

Re: crashme fault

2007-09-16 Thread Linus Torvalds
On Sun, 16 Sep 2007, Linus Torvalds wrote: I'm really starting to suspect some early EM64T bug, and I also suspect that it's harmless but that we should just do the trivial patch to say if the register state is in user mode, we don't care if the CPU says it was a kernel access. Namely

Re: crashme fault

2007-09-16 Thread Andi Kleen
On Sun, Sep 16, 2007 at 10:14:46AM -0700, Linus Torvalds wrote: On Sun, 16 Sep 2007, Randy Dunlap wrote: I'll apply this patch today, but I haven't done so yet (for the 2 bug reports below). Actually, it's probably better that you don't change your situation unnecessarily, in case

Re: crashme fault

2007-09-16 Thread Randy Dunlap
On Sun, 16 Sep 2007 11:12:23 -0700 (PDT) Linus Torvalds wrote: On Sun, 16 Sep 2007, Linus Torvalds wrote: I'm really starting to suspect some early EM64T bug, and I also suspect that it's harmless but that we should just do the trivial patch to say if the register state is in user

Re: crashme fault

2007-09-16 Thread Linus Torvalds
On Sun, 16 Sep 2007, Randy Dunlap wrote: I'll test this overnight on 2.6.23-rc6-git2 since that was failing. I haven't been able to reproduce the fault on 2.6.21 after several hours of testing. I'll also test a microcode update to see if it helps. Before you do the microcode update,

Re: crashme fault

2007-09-15 Thread Andi Kleen
On Sat, Sep 15, 2007 at 03:47:19PM -0700, Linus Torvalds wrote: > > > On Sat, 15 Sep 2007, Linus Torvalds wrote: > > > > So regardless of whether we want to trust "user_mode(regs)" more than > > "error_code & PF_USER", it would definitely be very interesting if you can > > give a good "this

Re: crashme fault

2007-09-15 Thread Linus Torvalds
On Sat, 15 Sep 2007, Randy Dunlap wrote: > Command: ./crashme +2000 666 1000 1:00:00 1 Ok, that's close to what I was testing (one of the examples from the crashme docs). > > The original gjc crashme doesn't even do a "mprotect(PROT_EXEC)" by default > > (nor does it even compile on a modern

Re: crashme fault

2007-09-15 Thread Randy Dunlap
Linus Torvalds wrote: On Sat, 15 Sep 2007, Linus Torvalds wrote: So regardless of whether we want to trust "user_mode(regs)" more than "error_code & PF_USER", it would definitely be very interesting if you can give a good "this is where it started happening". Also, can you point to good

Re: crashme fault

2007-09-15 Thread Linus Torvalds
On Sat, 15 Sep 2007, Linus Torvalds wrote: > > So regardless of whether we want to trust "user_mode(regs)" more than > "error_code & PF_USER", it would definitely be very interesting if you can > give a good "this is where it started happening". Also, can you point to good crashme sources,

Re: crashme fault

2007-09-15 Thread Linus Torvalds
On Sat, 15 Sep 2007, Linus Torvalds wrote: > > Here's a really *stupid* patch (and untested too, btw) to see if it gets > easier to debug when you don't oops, just print the register state > instead. Side note - while thinking about this, I'm wondering whether maybe that "stupid" patch

Re: crashme fault

2007-09-15 Thread Randy Dunlap
Linus Torvalds wrote: On Sat, 15 Sep 2007, Randy Dunlap wrote: Had another on recent last night (probably not helpful): At least the original "crashme" would write its random number seeds to a logfile each time (and I made it fsync it in some versions), which meant that once a crash

Re: crashme fault

2007-09-15 Thread Linus Torvalds
On Sat, 15 Sep 2007, Randy Dunlap wrote: > > Had another on recent last night (probably not helpful): At least the original "crashme" would write its random number seeds to a logfile each time (and I made it fsync it in some versions), which meant that once a crash happened, you could

Re: crashme fault

2007-09-15 Thread Randy Dunlap
Andi Kleen wrote: Andi, anything comes to mind? No, unfortunately not. There weren't any changes to entry.S recently that could corrupt the error code as far as I remember. Also cannot think of something else. A version where it started happening would be useful. I'll begin testing older

Re: crashme fault

2007-09-15 Thread Andi Kleen
> Andi, anything comes to mind? No, unfortunately not. There weren't any changes to entry.S recently that could corrupt the error code as far as I remember. Also cannot think of something else. A version where it started happening would be useful. -Andi - To unsubscribe from this list: send

Re: crashme fault

2007-09-15 Thread Andi Kleen
Andi, anything comes to mind? No, unfortunately not. There weren't any changes to entry.S recently that could corrupt the error code as far as I remember. Also cannot think of something else. A version where it started happening would be useful. -Andi - To unsubscribe from this list: send the

Re: crashme fault

2007-09-15 Thread Randy Dunlap
Andi Kleen wrote: Andi, anything comes to mind? No, unfortunately not. There weren't any changes to entry.S recently that could corrupt the error code as far as I remember. Also cannot think of something else. A version where it started happening would be useful. I'll begin testing older

Re: crashme fault

2007-09-15 Thread Linus Torvalds
On Sat, 15 Sep 2007, Randy Dunlap wrote: Had another on recent last night (probably not helpful): At least the original crashme would write its random number seeds to a logfile each time (and I made it fsync it in some versions), which meant that once a crash happened, you could re-produce

Re: crashme fault

2007-09-15 Thread Randy Dunlap
Linus Torvalds wrote: On Sat, 15 Sep 2007, Randy Dunlap wrote: Had another on recent last night (probably not helpful): At least the original crashme would write its random number seeds to a logfile each time (and I made it fsync it in some versions), which meant that once a crash

Re: crashme fault

2007-09-15 Thread Linus Torvalds
On Sat, 15 Sep 2007, Linus Torvalds wrote: Here's a really *stupid* patch (and untested too, btw) to see if it gets easier to debug when you don't oops, just print the register state instead. Side note - while thinking about this, I'm wondering whether maybe that stupid patch might not

Re: crashme fault

2007-09-15 Thread Linus Torvalds
On Sat, 15 Sep 2007, Linus Torvalds wrote: So regardless of whether we want to trust user_mode(regs) more than error_code PF_USER, it would definitely be very interesting if you can give a good this is where it started happening. Also, can you point to good crashme sources, and give the

Re: crashme fault

2007-09-15 Thread Randy Dunlap
Linus Torvalds wrote: On Sat, 15 Sep 2007, Linus Torvalds wrote: So regardless of whether we want to trust user_mode(regs) more than error_code PF_USER, it would definitely be very interesting if you can give a good this is where it started happening. Also, can you point to good crashme

Re: crashme fault

2007-09-15 Thread Linus Torvalds
On Sat, 15 Sep 2007, Randy Dunlap wrote: Command: ./crashme +2000 666 1000 1:00:00 1 Ok, that's close to what I was testing (one of the examples from the crashme docs). The original gjc crashme doesn't even do a mprotect(PROT_EXEC) by default (nor does it even compile on a modern unix),

Re: crashme fault

2007-09-15 Thread Andi Kleen
On Sat, Sep 15, 2007 at 03:47:19PM -0700, Linus Torvalds wrote: On Sat, 15 Sep 2007, Linus Torvalds wrote: So regardless of whether we want to trust user_mode(regs) more than error_code PF_USER, it would definitely be very interesting if you can give a good this is where it

Re: crashme fault

2007-09-14 Thread Randy Dunlap
On Fri, 14 Sep 2007 22:05:17 -0700 Randy Dunlap wrote: > On Fri, 14 Sep 2007 21:28:12 -0700 (PDT) Linus Torvalds wrote: > > > On Wed, 12 Sep 2007, Randy Dunlap wrote: > > > > > > I run almost-daily kernel testing. I haven't seen 'crashme' cause a > > > kernel fault until today, and now I've

Re: crashme fault

2007-09-14 Thread Randy Dunlap
On Fri, 14 Sep 2007 21:28:12 -0700 (PDT) Linus Torvalds wrote: > On Wed, 12 Sep 2007, Randy Dunlap wrote: > > > > I run almost-daily kernel testing. I haven't seen 'crashme' cause a > > kernel fault until today, and now I've seen it twice on 2.6.23-rc6-git2, > > x86_64. After the first fault, I

Re: crashme fault

2007-09-14 Thread Linus Torvalds
On Wed, 12 Sep 2007, Randy Dunlap wrote: > > I run almost-daily kernel testing. I haven't seen 'crashme' cause a > kernel fault until today, and now I've seen it twice on 2.6.23-rc6-git2, > x86_64. After the first fault, I ran 'crashme' about 10 more times > to get the second fault (usually

Re: crashme fault

2007-09-14 Thread Linus Torvalds
On Wed, 12 Sep 2007, Randy Dunlap wrote: I run almost-daily kernel testing. I haven't seen 'crashme' cause a kernel fault until today, and now I've seen it twice on 2.6.23-rc6-git2, x86_64. After the first fault, I ran 'crashme' about 10 more times to get the second fault (usually for 10

Re: crashme fault

2007-09-14 Thread Randy Dunlap
On Fri, 14 Sep 2007 21:28:12 -0700 (PDT) Linus Torvalds wrote: On Wed, 12 Sep 2007, Randy Dunlap wrote: I run almost-daily kernel testing. I haven't seen 'crashme' cause a kernel fault until today, and now I've seen it twice on 2.6.23-rc6-git2, x86_64. After the first fault, I ran

Re: crashme fault

2007-09-14 Thread Randy Dunlap
On Fri, 14 Sep 2007 22:05:17 -0700 Randy Dunlap wrote: On Fri, 14 Sep 2007 21:28:12 -0700 (PDT) Linus Torvalds wrote: On Wed, 12 Sep 2007, Randy Dunlap wrote: I run almost-daily kernel testing. I haven't seen 'crashme' cause a kernel fault until today, and now I've seen it twice on

crashme fault

2007-09-12 Thread Randy Dunlap
I run almost-daily kernel testing. I haven't seen 'crashme' cause a kernel fault until today, and now I've seen it twice on 2.6.23-rc6-git2, x86_64. After the first fault, I ran 'crashme' about 10 more times to get the second fault (usually for 10 minutes, one time for 30 minutes). [This is

crashme fault

2007-09-12 Thread Randy Dunlap
I run almost-daily kernel testing. I haven't seen 'crashme' cause a kernel fault until today, and now I've seen it twice on 2.6.23-rc6-git2, x86_64. After the first fault, I ran 'crashme' about 10 more times to get the second fault (usually for 10 minutes, one time for 30 minutes). [This is