Hi John, Billy,
For what it's worth, the power distribution design on the BEE2 has
never been stellar. I haven't seen problems specifically with XAUI,
but during the massive orgy of BEE2 production and testing, it was
found that User FPGAs 2 and 3 tended to have higher DRAM error rates.
These 2 FPGAs are further from the power supplies, and I think they
(RAMP guys) had actually measured a noticeably larger voltage drop
for the DRAM power rail on these 2 FPGAs.
What you're seeing may or may not be a variation on this theme...
though each FPGA actually has independent linear regulators for
the MGT power supplies, so the effect should be minimized. If
you're interested, you can try poking around those supplies; each
FPGA has 4 linear regulators on the bottom of the board in the 5-pin
TO-type packages. Two supply 2.5V, and two supply 1.8VTT for the
MGTs; both are powered by the main 3.3V rail on the board.
Another simple thing to try would be to heatsink the main 3.3V
regulator (P26, the black one closest to FPGA 4), if it isn't
already. That one can get *hot*.
Thanks,
Henry
John Ford wrote:
John,
Coincidentally, I've found user 2 to be the worst of the bunch across
the HCRO BEE2s by a good margin. Some variation between units, of
course, but if it were me I'd try to run on a different slot (user 1 or
4) and see if it still breaks. Esp. as most errors I see with user 2
(FWD/REV BER, F.O. CX4 weirdness) can be tied (in some way) to driving
those ports.
Interesting. Thanks. We can give that a try, but we'll have to modify
our designs a bit. (a lot!)
Out of curiosity, have you tried staggering your output packets so they
fire in sequence rather than simultaneously?
We're about to try that now! Putting a few tens of clocks of delay in the
chain. We can't fully serialize them for obvious reasons. We need at
least 22.5 Gb/second out of the four ports.
John
Billy
-----Original Message-----
From: John Ford [mailto:[email protected]]
Sent: Wednesday, February 03, 2010 9:47 PM
To: Barott, William Chauncey
Cc: Matt Dexter; [email protected]
Subject: Re: [casper] 10 GbE ports on a BEE2
Matt-
Good summary. Let me correct recent changes:
B05, B10, and B15 FPGA 4 each have two active XAUI ports (one of them
runs
both with fiber CX4, I think, the others are a mix), and one active
10GbE
port.
Have several copies of similar firmware running on B16.
Correct that we do not run high speed on the center FPGA - corners
only
right now. Also do not have any designs with more than one 10GbE port
per
FPGA (most of our interchip comm is XAUI), so I don't know how helpful
these numbers are.
John:
Have you been locking up the same FPGA every time, or can you lock up
any
arbitrary user FPGA? I've seen some effects (link reliability / BER,
ability to drive fiber CX4) where there is measurable variation
between
different user fpgas. So far, I have not correlated
user-fpga-location to
firmware lockups, but wouldn't immediately rule it out, esp. if this
wasn't a design to be run on all corners.
We've only tried it on this one FPGA. (user-2) It's the output of our
coherent dispersion machine, and all 4 ports are used to send data to
the
switch for distribution to the GPU machines. One other FPGA has 4 xaui
links, but they are only used to receive data on that fpga, so it
doesn't
really drive anything.
The way our logic works, all four ports fire off an 8K packet
simultaneously. We've speculated that maybe that's too much drive power
at one instant. We haven't done any probing with the scope on the power
supplies as of yet.
It's weird that whatever is going on would lock up the selectmap
interface
between the FPGA's.
We always seem to find the corner cases. Sigh.
John
Billy
-----Original Message-----
From: [email protected] on behalf of Matt Dexter
Sent: Wed 2/3/2010 8:14 PM
To: [email protected]
Subject: Re: [casper] 10 GbE ports on a BEE2
Hi,
I'm not sure this is too helpful as this system uses 10 Gbps XAUI and
not
10 GbE (and I'm sure Billy Barott could say more) but for the record
(and
since Dan asked) I believe the ATA Beamformer's 16 BEE2s are cabled as
ATA BF number of 10 Gbps ports used per BEE2 and FPGA :
BEE2 Center Corner FPGAs
FPGA 1 2 3 4
BF1
B01 0+0 3+0 1+2 4+0 3+0
B02 0+0 3+0 1+2 4+0 3+0
B03 0+0 3+0 1+2 4+0 3+0
B04 0+0 3+0 1+2 4+0 3+0
B05 0+0 2+2 3+0 3+0 0+0
BF2
B06 0+0 3+0 1+2 4+0 3+0
B07 0+0 3+0 1+2 4+0 3+0
B08 0+0 3+0 1+2 4+0 3+0
B09 0+0 3+0 1+2 4+0 3+0
B10 0+0 3+1 3+0 3+0 0+0
BF3
B11 0+0 3+0 1+2 4+0 3+0
B12 0+0 3+0 1+2 4+0 3+0
B13 0+0 3+0 1+2 4+0 3+0
B14 0+0 3+0 1+2 4+0 3+0
B15 0+0 2+1 3+0 3+0 0+0
etc
B16 0+0 0+0 2+1 1+2 1+0
First number is number of passive copper CX4 cables.
Second number is number of active fiber optic CX4 cables.
The maximum number of usable active CX4 cables per BEE2 is quite low
due to power supply limitiations.
I believe various lab tests were done using 1 of the 2
high speed ports on the center FPGA but at the moment none
are in use at the observatory (as far as I know).
Matt Dexter
On Wed, 3 Feb 2010, John Ford wrote:
Hi all. Has anyone seen any problems when using 4 10 GbE ports on a
single FPGA in the BEE2?
We're having some problems where borph access from the control FPGA
seems
to lock up when we start up our designs. We've done some testing
with
hacked up designs, and we can only make it lock up (so far) when we
have
4
10 gbe ports active. Interestingly, some of the 10 GbE ports also
seem
to
lock up at that very instant.
The fpga still is running, but the control fpga cannot talk to it OR
ANY
OF THE OTHER 3 FPGA's. If you kill the borph process on the control
fpga
and restart it, it comes back to life.
We don't know what goes on under the hood of the yellow block system,
but
it seems that something is going wrong with whatever mechanism
controls
the bus to the control fpga.
John