On Wed, May 19, 1999 at 01:55:11AM -0400, Donald Becker wrote:
> On Tue, 18 May 1999, Eric Roman wrote:
>
> > for the UL spec. We had lots and lots of problems. We fixed them, but not
> > before calling our facilities people and doing a lot of power systems work,
> > including isolated grounds, doubly shielded isolation transformers, double
> > rated neutral wires, etc.
>
> It sounds as if you had some other problem -- my first reaction was "split
> neutral, flickering monitors".
The wiring for the room was done properly. Once we found out that we
had 60A of current going through our neutral in the panel we ran around
and checked everything we could think of. We shut down and rebalanced
the loads among the three phases, we checked the current draws again, we
checked the ground to neutral voltages on each outlet, etc. The problem
was that the electrician sort of thought that this was supposed to be an
office with a lot of computers in it (someone lied to our facilities
people 'cuz they didn't want to have to set the room up as a full
computer lab). So he ran one neutral for the entire room from the panel,
as opposed to one neutral per circuit, the way computer rooms are done.
> > Make sure that your electrician double rates your neutral. The neutral
> > conductor for your panel should be twice as large as the hot conductors.
>
> Errmm, only for multiphase power. Then the neutral conductor might be
> carrying more power than any single hot conductor. Although it's not
> double the power, the rule is designed for simplicity.
Of course for multiphase! The building mains are three phase four
wire deals. You run that into a panel. Neutral shouldn't carry any
power at all. That's the rule. In a linear, balanced, 3 phase system
the neutral conductor carries zero current. With computers, even when
you're balanced, you start seeing harmonics of the 60Hz current in the
neutral conductor. These odd harmonics don't cancel each other out (the
even and first harmonics cancel in the neutral) but they can add linearly.
So you start seeing massive current flow in a neutral that's supposed
to carry no current.
> With common bi-phase power wiring the neutral will never carry more than a
> single supply conductor.
That's on the supply side away from the panel. The neutral still shouldn't
carry any power right? It's bonded back to ground in the panel (well, the
building transformer). All the current flows should cancel out (neglecting
wire impedance). If your neutral is carrying power then you've got some
serious problems. (Probably a reversed hot and neutral)
> > Tell 'em to read the new NEC and the IEEE Emerald Book if he argues. Make
> > sure that you run separate neutral wires for each circuit.
>
> It sounds as if you did have a split neutral..
> They are pretty nasty if you have to live in front of a CRT.
Again, that was one of the first things we checked for. It was the easiest
to fix. No such luck.
> > So lemme say again. Plan for UL spec. If you're not willing to do this,
> > have at least one power engineer plan out the entire facility.
>
> The UL is a safety specification, mostly relating to avoiding fires and
> hazards. The local electrical codes, typically a slight tweak on the NEC,
> should be followed for supply sizing.
Yah. My whole point here was to show that the people at UL who rate power
supplies are quite a bit smarter than those of us who thought that checking
the power with an ammeter would be sufficient.
While I've got your attention though, would you mind talking about
clustering? We're building a cluster of 128 dual Pentium II systems with
Fast Ethernet. We've got about 80 duals online right now. Last year
we spent a long time playing around with the NAS Parallel Benchmarks
over MPICH on kernel 2.0.33. We got some very, very, horrible numbers.
I mean they weren't bad for a Linux cluster of say 16 duals. But when
we started doing 64 processor runs we got some horrible performance.
Here's an example:
Name Class NC Time Mop/s Mop/s/proc Version Filename
CG A 1 65.91 22.71 22.71 2.3 cg.A.1.egcs3-t3
CG A 2 44.98 33.27 16.63 2.3 cg.A.2.egcs3-serial
CG A 4 45.74 32.72 8.18 2.3 cg.A.4.egcs3-serial
CG A 8 34.67 43.16 5.39 2.3 cg.A.8.egcs3
CG A 16 34.31 43.61 2.73 2.3 cg.A.16.egcs3
CG A 32 31.92 46.88 1.47 2.3 cg.A.32.egcs3
CG A 64 28.72 52.11 0.81 2.3 cg.A.64.egcs3
Speedup of 2 for 64 processors? Does this make any sense whatsoever?
Granted, CG sends a lot of small messages, and this is a relatively
small problem size. But I'd hope to at least get something better than
a speedup of 2!
Here's what lu class C looks like:
Name Class NC Time Mop/s Mop/s/proc Version Filename
LU C 4 13890.64 146.79 36.70 2.3 lu.C.4.egcs3
LU C 8 7112.88 286.66 35.83 2.3 lu.C.8.egcs3
LU C 16 3580.95 569.40 35.59 2.3 lu.C.16.egcs3
LU C 32 1883.28 1082.68 33.83 2.3 lu.C.32.egcs3
LU C 64 1181.25 1726.13 26.97 2.3 lu.C.64.egcs2
Much nicer. Good? Well, that's a matter of opinion. We can get about
45 Mop/s/proc running this in serial. It looks like we lose a good bit
of the performance on a 2 processor machine while under heavy IO load.
It also looks like there's a ton of overhead associated with sending
small messages. So far we haven't learned a damn thing. (Where are the
128 processor results? Nowhere to be found. MPICH doesn't seem to want
to use more than 122 processors... It stops understanding how to query
DNS (?) at some point and then runs out of filehandles...)
This is where you tell me to run a 2.2 kernel because the spinlocks are
a bit more isolated, you can have more open filehandles, etc.
No one here understands why, but kernel 2.2 seriously breaks MPICH. Jobs
run unreliably (at least of 2.2.3, I haven't tried 2.2.9 yet 'cuz I'm sick
of doing global kernel upgrades every time a new kernel comes out...) This
started somewhere around 2.1.80 or so. There are two classic symptoms.
1/ Every node load drops to zero and one node stays at 1.
ps on the node w/ a load of one shows it select()ing an invalid file
handle quite often. netstat shows data being held up in a socket
that p4 seems to know nothing about. Our guess is that the kernel is
confusing itself by reordering filehandles and losing some of the data.
This happens on any number of parallel runs, but is more likely on larger
runs (more processors).
2/ p4 timeouts w/ closed control connections.
This happens mainly on large jobs when communication gets very heavy.
Josip Locarnic reported something similar a few months back and he's had
luck by disabling packet Nagl'ing on TCP_NODELAY packets. He's been
eable to get better reliability, but no real fix. I think that what
happens is that data packets are delayed for so long on a congested
system that p4's control connection times out and shuts down the job.
Some of the p4 processes realise this, many don't and the jobs have to
be terminated manually.
I've mailed the MPICH developers, they say "hey that's a problem!
maybe we'll fix it later this year" I've mailed the beowulf and
Linux SMP lists and been told for about fourty kernel revisions now that
the problem was fixed in the latest kernel...
We don't really know what to do. LAM 6.1 works somewhat reliably
(ironically, when you turn off guaranteed delivery and use a whole slew of
command line and compile time options...) and slowly. I only know of two
other people that see this problem, Doug Eadline from Paralogic and Josip
Locarnic. The only common denominator is the use of SMP linux systems.
As for effiency, I'm planning on testing the VIA software on our Gigabit
network later this year when an MPI implementation is available. I still
don't understand why Linux can't deliver fixed size messages without 3
orders of magnitude of deviation in time or why MPICH just refuses to
work correctly.
If you've got any advice or ideas I'd love to give 'em a try. I think
we're building one of the largest clusters of SMP Linux systems and it'd be
nice if it worked!
Thanks
--
Eric Roman <[EMAIL PROTECTED]> Department of Applied Mathematics
(516)632-8545 SUNY/Stony Brook
-
Linux SMP list: FIRST see FAQ at http://www.irisa.fr/prive/mentre/smp-faq/
To Unsubscribe: send "unsubscribe linux-smp" to [EMAIL PROTECTED]