On Wednesday 25 September 2002 07:25 pm, you wrote:
> On Wednesday September 25 2002 08:25 am, Marcia wrote:
> > Dear Tom
> >
> > Tom Brinkman wrote:
> > >>Since the 'nopentium' bandaid didn't fix it, let's start again
> > >>Marcia. List the hardware involved, particularly mobo, psu, video,
> > >> and what Mandrake version, which video drivers are used. Ram
> > >> vendor, if you know? IIRC, it's Mdk 8.2, with an ECS mobo. Got
> > >> the model/ revision/bios vendor and numbers?
> >
> > The link for my board is http://www.ecsusa.com/ and my motherboard is
> > the L7VMM.
>
> AMD apprv'd for your 1600+, unfortuntely, I have no experience with
> these new mico-boards (an i'm not an ECS fan). The lastest bios is 1.0a
> http://www.ecsusa.com/ecsusa/www.ecs.com.tw/download/l7vmm.htm
> "1. Remove "CPU warning temp item" in BIOS setup
> The ITE8705 chipset use the same high and low limit for "CPU warning
> temp & CPU shutdown temp"
> 2. To fix Hynix 128M X 2 or Samsung 128M X 2 system will auto-restart
> when running"
> .... either fix could be pertinent to your crash problem, so update if
> you don't already have 1.0a. Both are worrisome in that they deal with
> auto shutdowns (crashes), one for temp, the other for ram.
I will update the bios then for starters. I have never done this so what is
the procedure for doing this?
>
> I disabled the onboard lan because even though it worked
>
> > it was grabbing the same irq as sound. The company sent me a new lan
> > card which helped that it seems. This is an Athlon 1600+ XP with
> > 512MB PC2100 DDR, 266 MHZ SDRAM,
>
> Yes, but who makes the ram. Two important points, the actual ram
> chips and the pcb (board) implemetation of the chips. IOW's Micron
> chips (good) on a generic pcb (bad) ... well two wrongs don't make a
> right ;> Look in bios and see what the ram timings are. The most
> lenient are CAS 3-3-3, and if there's a setting for 'bank
> interleaving', disable it. At least till we tryin get your crash
> problem solved, go for lenient. 2-2-2 and 4-bank are the optimum, but
> only good ram on a good mobo with a good PSU can do it.
> Also it's 133 Mhz x2 ram. (the x2, and DDR are mostly maketing talk)
>
> Probly now's a good time to run the machine overnite booting to
> memtest86. Look on your CD's, or use SoftwareManager, you should find
> somethin like memtest86-3.0-2mdk . Install that rpm, it'll add a
> memtest86 boot option to lilo (or grub). When you re-boot, choose this
> option and let the tests run overnite.
>
> Plan B, if your machine doesn't like booting this option, then look
> in /boot. After installin the memtest rpm you'll see a file like
> memtest-3.0.bin. So put in a good floppy and type
> 'dd if=/boot/memtest-3.0.bin of=/dev/fd0' (caution your memtest version
> is probly differnet than mine). That'll make an memtest86 floppy you
> can boot from. Just choose 'floppy' from lilo. If you can't run
> memtest86 overnite with -0- errors, then we probly have found the
> problem ... the ram, or how well your motherboards gets along with it,
> or both. Could still be PSU tho.
>
> I had the cooler master added plus
>
> > an extra case fan. This is a brand new machine. I have Win95 as a
> > dual boot and Win does not have the problems that my Linux side has.
>
> Win9.x --> WinXP tolerates sloppy (win)hardware, actually encourages
> it IMO. Most all CoolerMaster hs/fans are AMD appr'vd, so we probly
> don't need to look there. I'd advise you tho, that it's probly usin a
> thermal pad to contact the cpu's die, and this will deteriorate over
> time, might even fail. Thermal grease is much better, now and later.
>
> > cat /proc/interrupts
> >
> > 11: 154 XT-PIC usb-uhci, usb-uhci
>
> What USB devices do you have? Appears two are sharing IRQ11 or it's
> possibly a double entry. Everything else looked good.
I have a usb HP 4300 scanjet scanner and a HP 940c usb printer.
>
> > There is a temperature and performance utility in the bios. What are
> > lm_sensors/gkrellm? I would gladly install this if needed.
>
> Most common causes of random, occaisional lockups and reboots are
> faulty ram, or overheating. Even a lot of Windoze problems get blamed
> on M$, when these two culprits are really at fault (specially Winsux
> Registry errors).
>
> The temp you see in bios is really only good for verifying that you
> have hardware support for temp, voltage, fan monitoring. When you see
> this temp the system is not under load, and usually is comin from a
> cool state. Specially if it's been off for more'n just a few seconds.
> Processor core temp is _very_ dynamic. Also there's only a very few
> current mobo's that can really access AthlonXP internal diode core
> temps (Asus, Gigabyte). All other boards, including yours an' mine,
> measure the temp from an external probe. 'Bout like tryin to see if the
> electric wires inside a wall are too hot, by holding your hand against
> the sheetrock. Still it's somethin to go by. Figure your cpu core temp
> is 10 to 20C hotter than the probe reports tho.
>
> So we need lm_sensors. It's on your CD's, install
> liblm_sensors1-2.6.4-4mdk
> lm_sensors-2.6.4-4mdk ...or just type 'lm-sensors' into
> SoftwareManager. We won't fool with gkrellm just yet. After the rpms
> are installed, su to root and run 'sensors-detect'. All the default
> answers to the questions it presents should be ok, just keep hitting
> <Enter>. When it get's towards the end, it'll output some lines that
> you need to edit into the end of either /etc/rc.d/rc.local and
> /etc/modules.conf While we're at it, add 'i2c-proc' (w/o the quotes)
> to /etc/modules. Gettin back to 'sensors-detect', it probly has one
> more question ... install the sensors.conf file?, say Yes. Then back
> in ('cd' to) /etc/rc.d/ ... type './rc.local' to restart rc.local
> and have the modules take effect. Then as user you should see
> temp/voltage/fan outputs when you type 'sensors' in a terminal. Some
> have reported a reboot is necessary, but I've never needed to.
>
> We'll concentrate on the cpu temp for now. The cpu temp should stay
> under 60C (from a probe), under 55C is better under extreme load (eg, a
> kernel compile, specially 'make modules', running cpuburn, etc.) For
> normal operation it should be under the low 50's to mid 40's. It's
> during high temp spikes or sustained load that systems freeze or
> spontaneously reboots occur. Keep an eye on system voltages too tho,
> they should be very close to slightly (+10%) over the voltages spec'd
> for your motherboard/cpu, and stay very steady.
>
> So for the acid test, cpuburn. It's probly on your CD's, if not get
> it here http://users.ev1.net/~redelm/ For your XP 1600+ you want
> to run 'burnK7'. While doin so, in another terminal check the output of
> 'sensors' frequently. If the cpu temp climbs to 65C and starts going
> over, abort burnK7 (Ctrl+C), and figure out what you need to do to
> improve cooling. 'Cause that's most likely your crash problem. If you
> notice the -5 and -12 volt ouputs droping too low (more'n 10%), then
> the PSU could be the problem. Voltage drops cause lockups/freezes too.
> If it all looks OK, and you can run burnK7 for an hour, your crash
> problem almost surely isn't hardware.
>
> Sorry for being so long winded, but I warned you that dianosing
> hardware over the phone was difficult ;)
Thank you very much for your detailed information here. I really appreciate
your time on this. I just got this tonight so will study and try these things
the next few days. I will let you know my results. I am sure this will be
resolved eventually.
Thanks again.
Sincerely,
Marcia
Want to buy your Pack or Services from MandrakeSoft?
Go to http://www.mandrakestore.com