hi john,

if you can't figure this out, it might be useful to send your query
out to the bee2 users group, as there are some experts there
that are not on the casper email list.    (eg:  pierre droz
and chen chang designed the bee2, and alex krasnov hooked
up a zillion xaui cables to a zillion bee2's for the ramp project).

best wishes,

dan


On 2/4/2010 1:18 PM, John Ford wrote:
Thanks for all the pointers.  We're going to try a few experiments
tomorrow when we have the instrument available:

1) pull off the 10 gbe cables from the ports and see if it makes any
difference.  I would suppose that the current would be reduced.

2) enable fewer 10 gbe ports.

3) insert a delay between each port firing off its packet.

We'll also take a look at the power supplies if we can get at them.  I'll
take the temperature of the regulators.

We're only using 8 ports altogether, 4 xaui and 4 10 gbe.

Any other ideas?

John


Good points Henry - thanks.

If there are only a total of 4 or so XAIU ports running on 1 board
a heatsink is probably not required.  But if many or all 18 are
in use I strongly recommend a heatsink on P26.

The ATA BEE2s have a heatsink on P26 just as you describe.
Without them the XAUI portion of testsuite would misbehave after a
short while.  This heatsink is the same type that is used 2 times on the
larger power supplies P28 and P29.

The heatsink makes a significant difference in the temperature
according to finger touch but I don't have thermocouple
measurements I can provide.

Regardless of heatsink or not P26 will shut itself off if too many
active CX4 cables are installed due to over-current condition.

Matt Dexter

On Thu, 4 Feb 2010, Henry Chen wrote:

Hi John, Billy,

For what it's worth, the power distribution design on the BEE2 has
never been stellar. I haven't seen problems specifically with XAUI,
but during the massive orgy of BEE2 production and testing, it was
found that User FPGAs 2 and 3 tended to have higher DRAM error rates.
These 2 FPGAs are further from the power supplies, and I think they
(RAMP guys) had actually measured a noticeably larger voltage drop
for the DRAM power rail on these 2 FPGAs.

What you're seeing may or may not be a variation on this theme...
though each FPGA actually has independent linear regulators for
the MGT power supplies, so the effect should be minimized. If
you're interested, you can try poking around those supplies; each
FPGA has 4 linear regulators on the bottom of the board in the 5-pin
TO-type packages. Two supply 2.5V, and two supply 1.8VTT for the
MGTs; both are powered by the main 3.3V rail on the board.

Another simple thing to try would be to heatsink the main 3.3V
regulator (P26, the black one closest to FPGA 4), if it isn't
already. That one can get *hot*.

Thanks,
Henry


John Ford wrote:
John,
Coincidentally, I've found user 2 to be the worst of the bunch across
the HCRO BEE2s by a good margin.  Some variation between units, of
course, but if it were me I'd try to run on a different slot (user 1
or
4) and see if it still breaks.  Esp. as most errors I see with user 2
(FWD/REV BER, F.O. CX4 weirdness) can be tied (in some way) to driving
those ports.
Interesting.  Thanks.  We can give that a try, but we'll have to modify
our designs a bit. (a lot!)

Out of curiosity, have you tried staggering your output packets so
they
fire in sequence rather than simultaneously?
We're about to try that now!  Putting a few tens of clocks of delay in
the
chain.  We can't fully serialize them for obvious reasons.  We need at
least 22.5 Gb/second out of the four ports.

John

Billy

-----Original Message-----
From: John Ford [mailto:[email protected]]
Sent: Wednesday, February 03, 2010 9:47 PM
To: Barott, William Chauncey
Cc: Matt Dexter; [email protected]
Subject: Re: [casper] 10 GbE ports on a BEE2

Matt-
Good summary.  Let me correct recent changes:
B05, B10, and B15 FPGA 4 each have two active XAUI ports (one of them
runs
both with fiber CX4, I think, the others are a mix), and one active
10GbE
port.
Have several copies of similar firmware running on B16.

Correct that we do not run high speed on the center FPGA - corners
only
right now.  Also do not have any designs with more than one 10GbE
port
per
FPGA (most of our interchip comm is XAUI), so I don't know how
helpful
these numbers are.

John:
Have you been locking up the same FPGA every time, or can you lock up
any
arbitrary user FPGA?  I've seen some effects (link reliability / BER,
ability to drive fiber CX4) where there is measurable variation
between
different user fpgas.  So far, I have not correlated
user-fpga-location to
firmware lockups, but wouldn't immediately rule it out, esp. if this
wasn't a design to be run on all corners.
We've only tried it on this one FPGA. (user-2)  It's the output of our
coherent dispersion machine, and all 4 ports are used to send data to
the
switch for distribution to the GPU machines.  One other FPGA has 4
xaui
links, but they are only used to receive data on that fpga, so it
doesn't
really drive anything.

The way our logic works, all four ports fire off an 8K packet
simultaneously.  We've speculated that maybe that's too much drive
power
at one instant.  We haven't done any probing with the scope on the
power
supplies as of yet.

It's weird that whatever is going on would lock up the selectmap
interface
between the FPGA's.

We always seem to find the corner cases.  Sigh.

John

Billy



-----Original Message-----
From: [email protected] on behalf of Matt Dexter
Sent: Wed 2/3/2010 8:14 PM
To: [email protected]
Subject: Re: [casper] 10 GbE ports on a BEE2

Hi,

I'm not sure this is too helpful as this system uses 10 Gbps XAUI and
not
10 GbE (and I'm sure Billy Barott could say more) but for the record
(and
since Dan asked) I believe the ATA Beamformer's 16 BEE2s are cabled
as

ATA BF number of 10 Gbps ports used per BEE2 and FPGA :

BEE2   Center      Corner FPGAs
         FPGA     1     2     3     4
BF1
B01     0+0     3+0   1+2   4+0   3+0
B02     0+0     3+0   1+2   4+0   3+0
B03     0+0     3+0   1+2   4+0   3+0
B04     0+0     3+0   1+2   4+0   3+0
B05     0+0     2+2   3+0   3+0   0+0

BF2
B06     0+0     3+0   1+2   4+0   3+0
B07     0+0     3+0   1+2   4+0   3+0
B08     0+0     3+0   1+2   4+0   3+0
B09     0+0     3+0   1+2   4+0   3+0
B10     0+0     3+1   3+0   3+0   0+0

BF3
B11     0+0     3+0   1+2   4+0   3+0
B12     0+0     3+0   1+2   4+0   3+0
B13     0+0     3+0   1+2   4+0   3+0
B14     0+0     3+0   1+2   4+0   3+0
B15     0+0     2+1   3+0   3+0   0+0

etc
B16     0+0     0+0   2+1   1+2   1+0

First number is number of passive copper CX4 cables.
Second number is number of active fiber optic CX4 cables.
The maximum number of usable active CX4 cables per BEE2 is quite low
due to power supply limitiations.

I believe various lab tests were done using 1 of the 2
high speed ports on the center FPGA but at the moment none
are in use at the observatory (as far as I know).

Matt Dexter

On Wed, 3 Feb 2010, John Ford wrote:

Hi all.  Has anyone seen any problems when using 4 10 GbE ports on a
single FPGA in the BEE2?

We're having some problems where borph access from the control FPGA
seems
to lock up when we start up our designs.  We've done some testing
with
hacked up designs, and we can only make it lock up (so far) when we
have
4
10 gbe ports active.  Interestingly, some of the 10 GbE ports also
seem
to
lock up at that very instant.

The fpga still is running, but the control fpga cannot talk to it OR
ANY
OF THE OTHER 3 FPGA's.  If you kill the borph process on the control
fpga
and restart it, it comes back to life.

We don't know what goes on under the hood of the yellow block
system,
but
it seems that something is going wrong with whatever mechanism
controls
the bus to the control fpga.

John


















Reply via email to