Greetiings! I'm forwarding this previously submitted bug report to
the beowulf lists and the lam users list to look for interested users
who could either confirm, deny, or help resolve this bug.
--[[message/rfc822]]
From: Camm Maguire <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED],[EMAIL PROTECTED]
cc: [EMAIL PROTECTED]
Subject: Bug in lam-6.2pl3/blacs1.1/scalapack1.6 combo
Mime-Version: 1.0 (generated by tm-edit 7.106)
Content-Type: text/plain; charset=US-ASCII
Message-Id: <[EMAIL PROTECTED]>
Date: Fri, 17 Sep 1999 23:07:52 -0400
Greetings! I've found a quite reproducible bug in the above software
combination. The command
mpirun -np 16 -O N xdinv
consistently fails with N=2048,nb=16,nr=nc=4 somwhere in the routine
pdgetri, specifically in the loop from lines 285 to 306. Running with
the -lamd option to mpirun clears the problem, seeming to indicate lam
in the failure. The MPI routines report the following error:
MPI_Recv: process in remote group is dead (rank 0, comm 3)
where the rank and comm numbers vary with no discernable pattern. I'm
running Linux 2.2.12, on a 16 Node PII350 Beowulf over 100Mbit
switched fast ethernet. There are no errors reported in the kernel
logs. LAM was configured with
./configure --prefix=`pwd`/debian/tmp/usr/lib/lam \
--with-final-home=/usr/lib/lam \
--with-rpi=usysv \
--with-shared \
--with-cc=$(CC)
and built with
intech19:/fix/c/home/camm/scalapack-1.6# egcc -v
Reading specs from /usr/lib/gcc-lib/i486-linux/egcs-2.91.60/specs
gcc version egcs-2.91.60 Debian 2.1 (egcs-1.1.1 release)
I've noticed that the (at least most frequent) problem block size is
16 when using double precision, which corresponds to a 2k message, the
same length as the reported lam/Linux performance problem on the web
site. Of course, here we don't just see poor performance, but
failure. I'll be trying lam 6.2 pl4 soon. Please advise if I can
supply any further information regarding this bug.
PS. Since writing this, I've tried lam-6.2b-pl4, and fournd the same
situation. The problem appears for block sizes in the 16-28 range ;
outside that range all is stable. Blacs is patched with the latest
mpi patch.
Take care,
Camm Maguire [EMAIL PROTECTED]
==========================================================================
"The earth is but one country, and mankind its citizens." -- Baha'u'llah