Re: [casper] 10 GbE ports on a BEE2

Henry Chen Thu, 04 Feb 2010 12:32:02 -0800

Hi John, Billy,

For what it's worth, the power distribution design on the BEE2 has
never been stellar. I haven't seen problems specifically with XAUI,
but during the massive orgy of BEE2 production and testing, it was
found that User FPGAs 2 and 3 tended to have higher DRAM error rates.
These 2 FPGAs are further from the power supplies, and I think they
(RAMP guys) had actually measured a noticeably larger voltage drop
for the DRAM power rail on these 2 FPGAs.


What you're seeing may or may not be a variation on this theme...
though each FPGA actually has independent linear regulators for
the MGT power supplies, so the effect should be minimized. If
you're interested, you can try poking around those supplies; each
FPGA has 4 linear regulators on the bottom of the board in the 5-pin
TO-type packages. Two supply 2.5V, and two supply 1.8VTT for the
MGTs; both are powered by the main 3.3V rail on the board.

Another simple thing to try would be to heatsink the main 3.3V
regulator (P26, the black one closest to FPGA 4), if it isn't
already. That one can get *hot*.

Thanks,
Henry


John Ford wrote:

John,
Coincidentally, I've found user 2 to be the worst of the bunch across
the HCRO BEE2s by a good margin.  Some variation between units, of
course, but if it were me I'd try to run on a different slot (user 1 or
4) and see if it still breaks.  Esp. as most errors I see with user 2
(FWD/REV BER, F.O. CX4 weirdness) can be tied (in some way) to driving
those ports.


Interesting.  Thanks.  We can give that a try, but we'll have to modify
our designs a bit. (a lot!)

Out of curiosity, have you tried staggering your output packets so they
fire in sequence rather than simultaneously?


We're about to try that now!  Putting a few tens of clocks of delay in the
chain.  We can't fully serialize them for obvious reasons.  We need at
least 22.5 Gb/second out of the four ports.

John

Billy

-----Original Message-----
From: John Ford [mailto:[email protected]]
Sent: Wednesday, February 03, 2010 9:47 PM
To: Barott, William Chauncey
Cc: Matt Dexter; [email protected]
Subject: Re: [casper] 10 GbE ports on a BEE2

Matt-
Good summary.  Let me correct recent changes:
B05, B10, and B15 FPGA 4 each have two active XAUI ports (one of them

runs

both with fiber CX4, I think, the others are a mix), and one active

10GbE

port.
Have several copies of similar firmware running on B16.

Correct that we do not run high speed on the center FPGA - corners

only

right now.  Also do not have any designs with more than one 10GbE port

per

FPGA (most of our interchip comm is XAUI), so I don't know how helpful
these numbers are.

John:
Have you been locking up the same FPGA every time, or can you lock up

any

arbitrary user FPGA?  I've seen some effects (link reliability / BER,
ability to drive fiber CX4) where there is measurable variation

between

different user fpgas.  So far, I have not correlated

user-fpga-location to

firmware lockups, but wouldn't immediately rule it out, esp. if this
wasn't a design to be run on all corners.

We've only tried it on this one FPGA. (user-2)  It's the output of our
coherent dispersion machine, and all 4 ports are used to send data to
the
switch for distribution to the GPU machines.  One other FPGA has 4 xaui
links, but they are only used to receive data on that fpga, so it
doesn't
really drive anything.

The way our logic works, all four ports fire off an 8K packet
simultaneously.  We've speculated that maybe that's too much drive power
at one instant.  We haven't done any probing with the scope on the power
supplies as of yet.

It's weird that whatever is going on would lock up the selectmap
interface
between the FPGA's.

We always seem to find the corner cases.  Sigh.

John

Billy



-----Original Message-----
From: [email protected] on behalf of Matt Dexter
Sent: Wed 2/3/2010 8:14 PM
To: [email protected]
Subject: Re: [casper] 10 GbE ports on a BEE2

Hi,

I'm not sure this is too helpful as this system uses 10 Gbps XAUI and

not

10 GbE (and I'm sure Billy Barott could say more) but for the record

(and

since Dan asked) I believe the ATA Beamformer's 16 BEE2s are cabled as

ATA BF number of 10 Gbps ports used per BEE2 and FPGA :

BEE2   Center      Corner FPGAs
        FPGA     1     2     3     4
BF1
B01     0+0     3+0   1+2   4+0   3+0
B02     0+0     3+0   1+2   4+0   3+0
B03     0+0     3+0   1+2   4+0   3+0
B04     0+0     3+0   1+2   4+0   3+0
B05     0+0     2+2   3+0   3+0   0+0

BF2
B06     0+0     3+0   1+2   4+0   3+0
B07     0+0     3+0   1+2   4+0   3+0
B08     0+0     3+0   1+2   4+0   3+0
B09     0+0     3+0   1+2   4+0   3+0
B10     0+0     3+1   3+0   3+0   0+0

BF3
B11     0+0     3+0   1+2   4+0   3+0
B12     0+0     3+0   1+2   4+0   3+0
B13     0+0     3+0   1+2   4+0   3+0
B14     0+0     3+0   1+2   4+0   3+0
B15     0+0     2+1   3+0   3+0   0+0

etc
B16     0+0     0+0   2+1   1+2   1+0

First number is number of passive copper CX4 cables.
Second number is number of active fiber optic CX4 cables.
The maximum number of usable active CX4 cables per BEE2 is quite low
due to power supply limitiations.

I believe various lab tests were done using 1 of the 2
high speed ports on the center FPGA but at the moment none
are in use at the observatory (as far as I know).

Matt Dexter

On Wed, 3 Feb 2010, John Ford wrote:

Hi all.  Has anyone seen any problems when using 4 10 GbE ports on a
single FPGA in the BEE2?

We're having some problems where borph access from the control FPGA
seems
to lock up when we start up our designs.  We've done some testing

with

hacked up designs, and we can only make it lock up (so far) when we

have

4
10 gbe ports active.  Interestingly, some of the 10 GbE ports also

seem

to
lock up at that very instant.

The fpga still is running, but the control fpga cannot talk to it OR

ANY

OF THE OTHER 3 FPGA's.  If you kill the borph process on the control
fpga
and restart it, it comes back to life.

We don't know what goes on under the hood of the yellow block system,
but
it seems that something is going wrong with whatever mechanism

controls

the bus to the control fpga.

John

Re: [casper] 10 GbE ports on a BEE2

Reply via email to