Thanks for all the pointers. We're going to try a few experiments tomorrow when we have the instrument available:
1) pull off the 10 gbe cables from the ports and see if it makes any difference. I would suppose that the current would be reduced. 2) enable fewer 10 gbe ports. 3) insert a delay between each port firing off its packet. We'll also take a look at the power supplies if we can get at them. I'll take the temperature of the regulators. We're only using 8 ports altogether, 4 xaui and 4 10 gbe. Any other ideas? John > Good points Henry - thanks. > > If there are only a total of 4 or so XAIU ports running on 1 board > a heatsink is probably not required. But if many or all 18 are > in use I strongly recommend a heatsink on P26. > > The ATA BEE2s have a heatsink on P26 just as you describe. > Without them the XAUI portion of testsuite would misbehave after a > short while. This heatsink is the same type that is used 2 times on the > larger power supplies P28 and P29. > > The heatsink makes a significant difference in the temperature > according to finger touch but I don't have thermocouple > measurements I can provide. > > Regardless of heatsink or not P26 will shut itself off if too many > active CX4 cables are installed due to over-current condition. > > Matt Dexter > > On Thu, 4 Feb 2010, Henry Chen wrote: > >> Hi John, Billy, >> >> For what it's worth, the power distribution design on the BEE2 has >> never been stellar. I haven't seen problems specifically with XAUI, >> but during the massive orgy of BEE2 production and testing, it was >> found that User FPGAs 2 and 3 tended to have higher DRAM error rates. >> These 2 FPGAs are further from the power supplies, and I think they >> (RAMP guys) had actually measured a noticeably larger voltage drop >> for the DRAM power rail on these 2 FPGAs. >> >> What you're seeing may or may not be a variation on this theme... >> though each FPGA actually has independent linear regulators for >> the MGT power supplies, so the effect should be minimized. If >> you're interested, you can try poking around those supplies; each >> FPGA has 4 linear regulators on the bottom of the board in the 5-pin >> TO-type packages. Two supply 2.5V, and two supply 1.8VTT for the >> MGTs; both are powered by the main 3.3V rail on the board. >> >> Another simple thing to try would be to heatsink the main 3.3V >> regulator (P26, the black one closest to FPGA 4), if it isn't >> already. That one can get *hot*. >> >> Thanks, >> Henry >> >> >> John Ford wrote: >>>> John, >>>> Coincidentally, I've found user 2 to be the worst of the bunch across >>>> the HCRO BEE2s by a good margin. Some variation between units, of >>>> course, but if it were me I'd try to run on a different slot (user 1 >>>> or >>>> 4) and see if it still breaks. Esp. as most errors I see with user 2 >>>> (FWD/REV BER, F.O. CX4 weirdness) can be tied (in some way) to driving >>>> those ports. >>> >>> Interesting. Thanks. We can give that a try, but we'll have to modify >>> our designs a bit. (a lot!) >>> >>>> Out of curiosity, have you tried staggering your output packets so >>>> they >>>> fire in sequence rather than simultaneously? >>> >>> We're about to try that now! Putting a few tens of clocks of delay in >>> the >>> chain. We can't fully serialize them for obvious reasons. We need at >>> least 22.5 Gb/second out of the four ports. >>> >>> John >>> >>>> Billy >>>> >>>> -----Original Message----- >>>> From: John Ford [mailto:[email protected]] >>>> Sent: Wednesday, February 03, 2010 9:47 PM >>>> To: Barott, William Chauncey >>>> Cc: Matt Dexter; [email protected] >>>> Subject: Re: [casper] 10 GbE ports on a BEE2 >>>> >>>>> Matt- >>>>> Good summary. Let me correct recent changes: >>>>> B05, B10, and B15 FPGA 4 each have two active XAUI ports (one of them >>>> runs >>>>> both with fiber CX4, I think, the others are a mix), and one active >>>> 10GbE >>>>> port. >>>>> Have several copies of similar firmware running on B16. >>>>> >>>>> Correct that we do not run high speed on the center FPGA - corners >>>> only >>>>> right now. Also do not have any designs with more than one 10GbE >>>>> port >>>> per >>>>> FPGA (most of our interchip comm is XAUI), so I don't know how >>>>> helpful >>>>> these numbers are. >>>>> >>>>> John: >>>>> Have you been locking up the same FPGA every time, or can you lock up >>>> any >>>>> arbitrary user FPGA? I've seen some effects (link reliability / BER, >>>>> ability to drive fiber CX4) where there is measurable variation >>>> between >>>>> different user fpgas. So far, I have not correlated >>>> user-fpga-location to >>>>> firmware lockups, but wouldn't immediately rule it out, esp. if this >>>>> wasn't a design to be run on all corners. >>>> We've only tried it on this one FPGA. (user-2) It's the output of our >>>> coherent dispersion machine, and all 4 ports are used to send data to >>>> the >>>> switch for distribution to the GPU machines. One other FPGA has 4 >>>> xaui >>>> links, but they are only used to receive data on that fpga, so it >>>> doesn't >>>> really drive anything. >>>> >>>> The way our logic works, all four ports fire off an 8K packet >>>> simultaneously. We've speculated that maybe that's too much drive >>>> power >>>> at one instant. We haven't done any probing with the scope on the >>>> power >>>> supplies as of yet. >>>> >>>> It's weird that whatever is going on would lock up the selectmap >>>> interface >>>> between the FPGA's. >>>> >>>> We always seem to find the corner cases. Sigh. >>>> >>>> John >>>> >>>>> Billy >>>>> >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: [email protected] on behalf of Matt Dexter >>>>> Sent: Wed 2/3/2010 8:14 PM >>>>> To: [email protected] >>>>> Subject: Re: [casper] 10 GbE ports on a BEE2 >>>>> >>>>> Hi, >>>>> >>>>> I'm not sure this is too helpful as this system uses 10 Gbps XAUI and >>>> not >>>>> 10 GbE (and I'm sure Billy Barott could say more) but for the record >>>> (and >>>>> since Dan asked) I believe the ATA Beamformer's 16 BEE2s are cabled >>>>> as >>>>> >>>>> ATA BF number of 10 Gbps ports used per BEE2 and FPGA : >>>>> >>>>> BEE2 Center Corner FPGAs >>>>> FPGA 1 2 3 4 >>>>> BF1 >>>>> B01 0+0 3+0 1+2 4+0 3+0 >>>>> B02 0+0 3+0 1+2 4+0 3+0 >>>>> B03 0+0 3+0 1+2 4+0 3+0 >>>>> B04 0+0 3+0 1+2 4+0 3+0 >>>>> B05 0+0 2+2 3+0 3+0 0+0 >>>>> >>>>> BF2 >>>>> B06 0+0 3+0 1+2 4+0 3+0 >>>>> B07 0+0 3+0 1+2 4+0 3+0 >>>>> B08 0+0 3+0 1+2 4+0 3+0 >>>>> B09 0+0 3+0 1+2 4+0 3+0 >>>>> B10 0+0 3+1 3+0 3+0 0+0 >>>>> >>>>> BF3 >>>>> B11 0+0 3+0 1+2 4+0 3+0 >>>>> B12 0+0 3+0 1+2 4+0 3+0 >>>>> B13 0+0 3+0 1+2 4+0 3+0 >>>>> B14 0+0 3+0 1+2 4+0 3+0 >>>>> B15 0+0 2+1 3+0 3+0 0+0 >>>>> >>>>> etc >>>>> B16 0+0 0+0 2+1 1+2 1+0 >>>>> >>>>> First number is number of passive copper CX4 cables. >>>>> Second number is number of active fiber optic CX4 cables. >>>>> The maximum number of usable active CX4 cables per BEE2 is quite low >>>>> due to power supply limitiations. >>>>> >>>>> I believe various lab tests were done using 1 of the 2 >>>>> high speed ports on the center FPGA but at the moment none >>>>> are in use at the observatory (as far as I know). >>>>> >>>>> Matt Dexter >>>>> >>>>> On Wed, 3 Feb 2010, John Ford wrote: >>>>> >>>>>> Hi all. Has anyone seen any problems when using 4 10 GbE ports on a >>>>>> single FPGA in the BEE2? >>>>>> >>>>>> We're having some problems where borph access from the control FPGA >>>>>> seems >>>>>> to lock up when we start up our designs. We've done some testing >>>> with >>>>>> hacked up designs, and we can only make it lock up (so far) when we >>>> have >>>>>> 4 >>>>>> 10 gbe ports active. Interestingly, some of the 10 GbE ports also >>>> seem >>>>>> to >>>>>> lock up at that very instant. >>>>>> >>>>>> The fpga still is running, but the control fpga cannot talk to it OR >>>> ANY >>>>>> OF THE OTHER 3 FPGA's. If you kill the borph process on the control >>>>>> fpga >>>>>> and restart it, it comes back to life. >>>>>> >>>>>> We don't know what goes on under the hood of the yellow block >>>>>> system, >>>>>> but >>>>>> it seems that something is going wrong with whatever mechanism >>>> controls >>>>>> the bus to the control fpga. >>>>>> >>>>>> John >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>> >>> >>> >> >> >

