John,
Coincidentally, I've found user 2 to be the worst of the bunch across
the HCRO BEE2s by a good margin. Some variation between units, of
course, but if it were me I'd try to run on a different slot (user 1
or
4) and see if it still breaks. Esp. as most errors I see with user 2
(FWD/REV BER, F.O. CX4 weirdness) can be tied (in some way) to driving
those ports.
Billy
-----Original Message-----
From: John Ford [mailto:[email protected]]
Sent: Wednesday, February 03, 2010 9:47 PM
To: Barott, William Chauncey
Cc: Matt Dexter; [email protected]
Subject: Re: [casper] 10 GbE ports on a BEE2
Matt-
Good summary. Let me correct recent changes:
B05, B10, and B15 FPGA 4 each have two active XAUI ports (one of them
runs
both with fiber CX4, I think, the others are a mix), and one active
10GbE
port.
Have several copies of similar firmware running on B16.
Correct that we do not run high speed on the center FPGA - corners
only
right now. Also do not have any designs with more than one 10GbE
port
per
FPGA (most of our interchip comm is XAUI), so I don't know how
helpful
these numbers are.
John:
Have you been locking up the same FPGA every time, or can you lock up
any
arbitrary user FPGA? I've seen some effects (link reliability / BER,
ability to drive fiber CX4) where there is measurable variation
between
different user fpgas. So far, I have not correlated
user-fpga-location to
firmware lockups, but wouldn't immediately rule it out, esp. if this
wasn't a design to be run on all corners.
We've only tried it on this one FPGA. (user-2) It's the output of our
coherent dispersion machine, and all 4 ports are used to send data to
the
switch for distribution to the GPU machines. One other FPGA has 4
xaui
links, but they are only used to receive data on that fpga, so it
doesn't
really drive anything.
The way our logic works, all four ports fire off an 8K packet
simultaneously. We've speculated that maybe that's too much drive
power
at one instant. We haven't done any probing with the scope on the
power
supplies as of yet.
It's weird that whatever is going on would lock up the selectmap
interface
between the FPGA's.
We always seem to find the corner cases. Sigh.
John
Billy
-----Original Message-----
From: [email protected] on behalf of Matt Dexter
Sent: Wed 2/3/2010 8:14 PM
To: [email protected]
Subject: Re: [casper] 10 GbE ports on a BEE2
Hi,
I'm not sure this is too helpful as this system uses 10 Gbps XAUI and
not
10 GbE (and I'm sure Billy Barott could say more) but for the record
(and
since Dan asked) I believe the ATA Beamformer's 16 BEE2s are cabled
as
ATA BF number of 10 Gbps ports used per BEE2 and FPGA :
BEE2 Center Corner FPGAs
FPGA 1 2 3 4
BF1
B01 0+0 3+0 1+2 4+0 3+0
B02 0+0 3+0 1+2 4+0 3+0
B03 0+0 3+0 1+2 4+0 3+0
B04 0+0 3+0 1+2 4+0 3+0
B05 0+0 2+2 3+0 3+0 0+0
BF2
B06 0+0 3+0 1+2 4+0 3+0
B07 0+0 3+0 1+2 4+0 3+0
B08 0+0 3+0 1+2 4+0 3+0
B09 0+0 3+0 1+2 4+0 3+0
B10 0+0 3+1 3+0 3+0 0+0
BF3
B11 0+0 3+0 1+2 4+0 3+0
B12 0+0 3+0 1+2 4+0 3+0
B13 0+0 3+0 1+2 4+0 3+0
B14 0+0 3+0 1+2 4+0 3+0
B15 0+0 2+1 3+0 3+0 0+0
etc
B16 0+0 0+0 2+1 1+2 1+0
First number is number of passive copper CX4 cables.
Second number is number of active fiber optic CX4 cables.
The maximum number of usable active CX4 cables per BEE2 is quite low
due to power supply limitiations.
I believe various lab tests were done using 1 of the 2
high speed ports on the center FPGA but at the moment none
are in use at the observatory (as far as I know).
Matt Dexter
On Wed, 3 Feb 2010, John Ford wrote:
Hi all. Has anyone seen any problems when using 4 10 GbE ports on a
single FPGA in the BEE2?
We're having some problems where borph access from the control FPGA
seems
to lock up when we start up our designs. We've done some testing
with
hacked up designs, and we can only make it lock up (so far) when we
have
4
10 gbe ports active. Interestingly, some of the 10 GbE ports also
seem
to
lock up at that very instant.
The fpga still is running, but the control fpga cannot talk to it OR
ANY
OF THE OTHER 3 FPGA's. If you kill the borph process on the control
fpga
and restart it, it comes back to life.
We don't know what goes on under the hood of the yellow block
system,
but
it seems that something is going wrong with whatever mechanism
controls
the bus to the control fpga.
John