Hello again!

That path unfortunately did not help me, but thanks anyway.

But I just found something interesting. The system is supposed to run a
2-threaded nice rc5des-process and produces the following output (in
cronological order):

longmorn:/proc# cat stat | head -3
cpu 880 109468 3086 14100
cpu0 430 57693 1805 3839
cpu1 450 51775 1281 10261
longmorn:/proc# cat stat | head -3
cpu 880 111785 3088 11781
cpu0 430 60010 1807 1520
cpu1 450 51775 1281 10261
-- waiting
longmorn:/proc# cat stat | head -3
cpu  881 160248 3092 4294930609
cpu0 431 108473 1811 4294920348
cpu1 450 51775 1281 10261

Dont these values represent time? Time should only be running in one
direction?

longmorn:/proc# ps aux | grep rc5
thorburn  145 99.0  0.2  756  360 ttyp0 RN 22:44 16:27 ./rc5des
thorburn  146 46.7  0.2  756  360 ttyp0 RN 22:44  4:42 ./rc5des

Repeating this commands gives that 145 is still running (16:27 increases)
but 146 is dead. 99.9% cpu? Are one process running on both processors?
The user thorburn is only ordinary user (no root, no sudo).

At this time the system is responsive if I am careful. That is I can
#umount -ar (which shortens the reboot-time alot ;-). top is a bad idea as
is "shutdown -r now" which starts a continous beep. This state is also
very reproducible (unfortunately). Yes, I am writing this mail on another
computer.

My problem definately seems to have to do with system load... The system
was just up for one and a half hour (just running xmahjongg), no problem,
and then died 10 seconds after I started rc5des (100% nice load).  

This makes me believe less in hardware failure... I dont even feel I can
blame my onboard adaptec 7895 for that (?).

Sometimes I get the "stuck on TLB IPI wait (CPU#0)" but usually I never
see that (when the mashine crashes).

Also I found that lpd consumed a lot of cpu-time (during normal work, not 
after crash). Not starting lpd at all and not loading that module
(probably) increased stability a bit and I got rid of the ghostload. No, I
dont have a printer... This problem did not occur in 2.2.7 SMP. This seems
like a bug to me...

The strange thing is that I averaged several hours uptime with 100% load
(rc5des) last week, but now I cant even think rc5. I've used 2.2.9 all the
time. I dont think I have changed anything...

I am running slackware 4.0 (beta from 990429, but nothing that should
affect this stuff changed in the "stable" release). Could glibc5 be the
problem (I know almost nothing about this stuff), or anything else with
the distribution?

Any suggestions would be very much appreciated. I will gladly supply you
(whoever) with more data if needed. Unfortunately my programming skills do
not allow any kernel-hacking.

Sorry for writing such a long mail...

  sincerely Gunnar Thorburn, [EMAIL PROTECTED]

#####

On Mon, 31 May 1999, George wrote:

> On Mon, 31 May 1999, Gunnar Thorburn wrote:
> 
> >I am wondering exactly the same thing because this error causes my machine
> >to crash too (in quite the same way as yours).
> 
> It's caused by some subtle SMP thing.  Here's a patch which fixes the
> symptom for me.  It basically reverts the changes to page_alloc.c in
> 2.2.0-pre6 to 2.2.0-pre7.
> 
> My test involved multi-threaded applications causing the system to run out
> of memory.  With 128MB of RAM it died barely getting into swap.  At 32MB I
> couldn't kill it and at 64MB it died around 50MB into swap.  So at least
> with my test the problem was memory oriented.
> 
> It's not the right fix, but it may help. I see better swapping with this
> anyway.
> 
> diff -u ../linux/mm/page_alloc.c ./linux/mm/page_alloc.c
> --- ../linux/mm/page_alloc.c  Wed May 12 16:38:15 1999
> +++ ./linux/mm/page_alloc.c   Mon May 31 15:11:04 1999
> @@ -189,8 +189,6 @@
>       atomic_set(&map->count, 1); \
>  } while (0)
>  
> -int low_on_memory = 0;
> -
>  unsigned long __get_free_pages(int gfp_mask, unsigned long order)
>  {
>       unsigned long flags;
> @@ -214,19 +212,14 @@
>        * do our best to just allocate things without
>        * further thought.
>        */
> -     if (!(current->flags & PF_MEMALLOC)) {
> +     if (current->flags & PF_MEMALLOC)
> +             goto ok_to_allocate;
> +     else {
>               int freed;
>  
> -             if (nr_free_pages > freepages.min) {
> -                     if (!low_on_memory)
> -                             goto ok_to_allocate;
> -                     if (nr_free_pages >= freepages.high) {
> -                             low_on_memory = 0;
> -                             goto ok_to_allocate;
> -                     }
> -             }
> +             if (nr_free_pages > freepages.low)
> +                     goto ok_to_allocate;
>  
> -             low_on_memory = 1;
>               current->flags |= PF_MEMALLOC;
>               freed = try_to_free_pages(gfp_mask);
>               current->flags &= ~PF_MEMALLOC;
> 
> -George Greer
> 
> -
> Linux SMP list: FIRST see FAQ at http://www.irisa.fr/prive/mentre/smp-faq/
> To Unsubscribe: send "unsubscribe linux-smp" to [EMAIL PROTECTED]
> 

-
Linux SMP list: FIRST see FAQ at http://www.irisa.fr/prive/mentre/smp-faq/
To Unsubscribe: send "unsubscribe linux-smp" to [EMAIL PROTECTED]

Reply via email to