Thanks for all the pointers.  We're going to try a few experiments
tomorrow when we have the instrument available:

1) pull off the 10 gbe cables from the ports and see if it makes any
difference.  I would suppose that the current would be reduced.

2) enable fewer 10 gbe ports.

3) insert a delay between each port firing off its packet.

We'll also take a look at the power supplies if we can get at them.  I'll
take the temperature of the regulators.

We're only using 8 ports altogether, 4 xaui and 4 10 gbe.

Any other ideas?

John


> Good points Henry - thanks.
>
> If there are only a total of 4 or so XAIU ports running on 1 board
> a heatsink is probably not required.  But if many or all 18 are
> in use I strongly recommend a heatsink on P26.
>
> The ATA BEE2s have a heatsink on P26 just as you describe.
> Without them the XAUI portion of testsuite would misbehave after a
> short while.  This heatsink is the same type that is used 2 times on the
> larger power supplies P28 and P29.
>
> The heatsink makes a significant difference in the temperature
> according to finger touch but I don't have thermocouple
> measurements I can provide.
>
> Regardless of heatsink or not P26 will shut itself off if too many
> active CX4 cables are installed due to over-current condition.
>
> Matt Dexter
>
> On Thu, 4 Feb 2010, Henry Chen wrote:
>
>> Hi John, Billy,
>>
>> For what it's worth, the power distribution design on the BEE2 has
>> never been stellar. I haven't seen problems specifically with XAUI,
>> but during the massive orgy of BEE2 production and testing, it was
>> found that User FPGAs 2 and 3 tended to have higher DRAM error rates.
>> These 2 FPGAs are further from the power supplies, and I think they
>> (RAMP guys) had actually measured a noticeably larger voltage drop
>> for the DRAM power rail on these 2 FPGAs.
>>
>> What you're seeing may or may not be a variation on this theme...
>> though each FPGA actually has independent linear regulators for
>> the MGT power supplies, so the effect should be minimized. If
>> you're interested, you can try poking around those supplies; each
>> FPGA has 4 linear regulators on the bottom of the board in the 5-pin
>> TO-type packages. Two supply 2.5V, and two supply 1.8VTT for the
>> MGTs; both are powered by the main 3.3V rail on the board.
>>
>> Another simple thing to try would be to heatsink the main 3.3V
>> regulator (P26, the black one closest to FPGA 4), if it isn't
>> already. That one can get *hot*.
>>
>> Thanks,
>> Henry
>>
>>
>> John Ford wrote:
>>>> John,
>>>> Coincidentally, I've found user 2 to be the worst of the bunch across
>>>> the HCRO BEE2s by a good margin.  Some variation between units, of
>>>> course, but if it were me I'd try to run on a different slot (user 1
>>>> or
>>>> 4) and see if it still breaks.  Esp. as most errors I see with user 2
>>>> (FWD/REV BER, F.O. CX4 weirdness) can be tied (in some way) to driving
>>>> those ports.
>>>
>>> Interesting.  Thanks.  We can give that a try, but we'll have to modify
>>> our designs a bit. (a lot!)
>>>
>>>> Out of curiosity, have you tried staggering your output packets so
>>>> they
>>>> fire in sequence rather than simultaneously?
>>>
>>> We're about to try that now!  Putting a few tens of clocks of delay in
>>> the
>>> chain.  We can't fully serialize them for obvious reasons.  We need at
>>> least 22.5 Gb/second out of the four ports.
>>>
>>> John
>>>
>>>> Billy
>>>>
>>>> -----Original Message-----
>>>> From: John Ford [mailto:[email protected]]
>>>> Sent: Wednesday, February 03, 2010 9:47 PM
>>>> To: Barott, William Chauncey
>>>> Cc: Matt Dexter; [email protected]
>>>> Subject: Re: [casper] 10 GbE ports on a BEE2
>>>>
>>>>> Matt-
>>>>> Good summary.  Let me correct recent changes:
>>>>> B05, B10, and B15 FPGA 4 each have two active XAUI ports (one of them
>>>> runs
>>>>> both with fiber CX4, I think, the others are a mix), and one active
>>>> 10GbE
>>>>> port.
>>>>> Have several copies of similar firmware running on B16.
>>>>>
>>>>> Correct that we do not run high speed on the center FPGA - corners
>>>> only
>>>>> right now.  Also do not have any designs with more than one 10GbE
>>>>> port
>>>> per
>>>>> FPGA (most of our interchip comm is XAUI), so I don't know how
>>>>> helpful
>>>>> these numbers are.
>>>>>
>>>>> John:
>>>>> Have you been locking up the same FPGA every time, or can you lock up
>>>> any
>>>>> arbitrary user FPGA?  I've seen some effects (link reliability / BER,
>>>>> ability to drive fiber CX4) where there is measurable variation
>>>> between
>>>>> different user fpgas.  So far, I have not correlated
>>>> user-fpga-location to
>>>>> firmware lockups, but wouldn't immediately rule it out, esp. if this
>>>>> wasn't a design to be run on all corners.
>>>> We've only tried it on this one FPGA. (user-2)  It's the output of our
>>>> coherent dispersion machine, and all 4 ports are used to send data to
>>>> the
>>>> switch for distribution to the GPU machines.  One other FPGA has 4
>>>> xaui
>>>> links, but they are only used to receive data on that fpga, so it
>>>> doesn't
>>>> really drive anything.
>>>>
>>>> The way our logic works, all four ports fire off an 8K packet
>>>> simultaneously.  We've speculated that maybe that's too much drive
>>>> power
>>>> at one instant.  We haven't done any probing with the scope on the
>>>> power
>>>> supplies as of yet.
>>>>
>>>> It's weird that whatever is going on would lock up the selectmap
>>>> interface
>>>> between the FPGA's.
>>>>
>>>> We always seem to find the corner cases.  Sigh.
>>>>
>>>> John
>>>>
>>>>> Billy
>>>>>
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: [email protected] on behalf of Matt Dexter
>>>>> Sent: Wed 2/3/2010 8:14 PM
>>>>> To: [email protected]
>>>>> Subject: Re: [casper] 10 GbE ports on a BEE2
>>>>>
>>>>> Hi,
>>>>>
>>>>> I'm not sure this is too helpful as this system uses 10 Gbps XAUI and
>>>> not
>>>>> 10 GbE (and I'm sure Billy Barott could say more) but for the record
>>>> (and
>>>>> since Dan asked) I believe the ATA Beamformer's 16 BEE2s are cabled
>>>>> as
>>>>>
>>>>> ATA BF number of 10 Gbps ports used per BEE2 and FPGA :
>>>>>
>>>>> BEE2   Center      Corner FPGAs
>>>>>         FPGA     1     2     3     4
>>>>> BF1
>>>>> B01     0+0     3+0   1+2   4+0   3+0
>>>>> B02     0+0     3+0   1+2   4+0   3+0
>>>>> B03     0+0     3+0   1+2   4+0   3+0
>>>>> B04     0+0     3+0   1+2   4+0   3+0
>>>>> B05     0+0     2+2   3+0   3+0   0+0
>>>>>
>>>>> BF2
>>>>> B06     0+0     3+0   1+2   4+0   3+0
>>>>> B07     0+0     3+0   1+2   4+0   3+0
>>>>> B08     0+0     3+0   1+2   4+0   3+0
>>>>> B09     0+0     3+0   1+2   4+0   3+0
>>>>> B10     0+0     3+1   3+0   3+0   0+0
>>>>>
>>>>> BF3
>>>>> B11     0+0     3+0   1+2   4+0   3+0
>>>>> B12     0+0     3+0   1+2   4+0   3+0
>>>>> B13     0+0     3+0   1+2   4+0   3+0
>>>>> B14     0+0     3+0   1+2   4+0   3+0
>>>>> B15     0+0     2+1   3+0   3+0   0+0
>>>>>
>>>>> etc
>>>>> B16     0+0     0+0   2+1   1+2   1+0
>>>>>
>>>>> First number is number of passive copper CX4 cables.
>>>>> Second number is number of active fiber optic CX4 cables.
>>>>> The maximum number of usable active CX4 cables per BEE2 is quite low
>>>>> due to power supply limitiations.
>>>>>
>>>>> I believe various lab tests were done using 1 of the 2
>>>>> high speed ports on the center FPGA but at the moment none
>>>>> are in use at the observatory (as far as I know).
>>>>>
>>>>> Matt Dexter
>>>>>
>>>>> On Wed, 3 Feb 2010, John Ford wrote:
>>>>>
>>>>>> Hi all.  Has anyone seen any problems when using 4 10 GbE ports on a
>>>>>> single FPGA in the BEE2?
>>>>>>
>>>>>> We're having some problems where borph access from the control FPGA
>>>>>> seems
>>>>>> to lock up when we start up our designs.  We've done some testing
>>>> with
>>>>>> hacked up designs, and we can only make it lock up (so far) when we
>>>> have
>>>>>> 4
>>>>>> 10 gbe ports active.  Interestingly, some of the 10 GbE ports also
>>>> seem
>>>>>> to
>>>>>> lock up at that very instant.
>>>>>>
>>>>>> The fpga still is running, but the control fpga cannot talk to it OR
>>>> ANY
>>>>>> OF THE OTHER 3 FPGA's.  If you kill the borph process on the control
>>>>>> fpga
>>>>>> and restart it, it comes back to life.
>>>>>>
>>>>>> We don't know what goes on under the hood of the yellow block
>>>>>> system,
>>>>>> but
>>>>>> it seems that something is going wrong with whatever mechanism
>>>> controls
>>>>>> the bus to the control fpga.
>>>>>>
>>>>>> John
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>



Reply via email to