[casper] BEE2 hanging

John Ford Fri, 29 Jan 2010 13:24:18 -0800

Hi all.

We're working hard on cleaning up our 800 MHz Coherent Dedispersion pulsar
machine for production.  We have it working with 8 GPU machines, and from
64 to 2048 coarse channels.


One problem we have is that with our output FPGA that rearranges the data
and ships it off simultaneously over 4 10 GbE ports, sometimes sending an
arm() command (which tells the system to start on the next 1 PPS) locks up
the communication with that FPGA.

The arm command (python) just does 2 writes to the same register, first
sending a zero, then sending a one after sleeping for a second.

If we kill the program that's trying to write to the fpga, we can unload
the bof and reload it, it starts working again.  Then it will fail again
with an arm() at some random number of times later.

It seems to fail more often if we run the system at high speed.  Paul says
it doesn't fail at all at 200 MHz, instead of our usual 800 MHz ADC clock
rate.

Our previous design that is for the regular guppi modes does not do this.

Any ideas where to look for this?

Does trying to read or write a non-existent register make borph unhappy
enough to smite us?

Thanks for any insight.

John

[casper] BEE2 hanging

Reply via email to