Hi, Mark,
Thanks for the reply. We hadn't considered power
before. This is the second ATX power supply now
under which we've had this problem. We were using a
250 W supply, and now we have a 300 W supply. The
machine has been on an APC SmartUPS 1250 the entire
time. I'll try submitting those 'main' jobs as you
suggested. But with our particular application,
'top' indicates that both (or more) submitted jobs
are running, and their logfiles are getting written
out properly. But then one of them crashes, usually
after about 20 minutes. What are "remarked"
processors? Thanks again,
Hidong
---Mark Hahn <[EMAIL PROTECTED]>
wrote:
>
> > problem. You're probably thinking that there's
> > something wrong with the application. But
several of
>
> not at all. I'm thinking there's something wrong
with your hardware.
> memory, for instance. or cooling, or power.
We hadn't considered power. This is the second ATX
power supply under which we've seen this problem. We
had a 250 W supply before, and we recently got a 300
W supply, which is the one we're using now. The
machine has always been on an APC SmartUPS 1250. Any
culprits here?
>
> > everything seems to check out. We've tried various
> > kernels including 2.0.35, 2.0.36, 2.1.125, and
> > 2.2.0-pre6, all with the same problem. With all of
>
> which pretty much rules out the possibility of any
fault
> in the kernel, since there's essentially zero code
overlap
> between, say, 2.0.35 and 2.2.0
>
> > the smp kernels we've compiled, 'cat /proc/cpuinfo'
> > indicates both processors, and their specs look
> > right. Is there some smp benchmark we can run that
> > would indicate that the processors are actually
> > running ok, instead of just telling you that
they're
> > there?
>
> run two copies of anything, even "main() {while
(1);}".
> observe, using something like top, that they're
both running.
>
> > We've also considered overheating, but we get
> > this problem even with the case off.
>
> not meaningful. taking the case off can actually
make cooling worse.
>
> > We've also
> > never overclocked our processors.
>
> are you sure they're not remarked?
>
> > What else could we
> > try?
>
> the power supply, first. then start taking
hardware out.
> better yet, isolate a meaningful way to replicate
the problem.
>
>
_________________________________________________________
DO YOU YAHOO!?
Get your free @yahoo.com address at http://mail.yahoo.com
-
Linux SMP list: FIRST see FAQ at http://www.irisa.fr/prive/mentre/smp-faq/
To Unsubscribe: send "unsubscribe linux-smp" to [EMAIL PROTECTED]