Hi all. We're working hard on cleaning up our 800 MHz Coherent Dedispersion pulsar machine for production. We have it working with 8 GPU machines, and from 64 to 2048 coarse channels.
One problem we have is that with our output FPGA that rearranges the data and ships it off simultaneously over 4 10 GbE ports, sometimes sending an arm() command (which tells the system to start on the next 1 PPS) locks up the communication with that FPGA. The arm command (python) just does 2 writes to the same register, first sending a zero, then sending a one after sleeping for a second. If we kill the program that's trying to write to the fpga, we can unload the bof and reload it, it starts working again. Then it will fail again with an arm() at some random number of times later. It seems to fail more often if we run the system at high speed. Paul says it doesn't fail at all at 200 MHz, instead of our usual 800 MHz ADC clock rate. Our previous design that is for the regular guppi modes does not do this. Any ideas where to look for this? Does trying to read or write a non-existent register make borph unhappy enough to smite us? Thanks for any insight. John