Re: [casper] Roach-2 crashing fix
So I confess to relying on third parties for this information, but isn't the board populated with 1Gb RAM after all ? Would the crash be trigged by a kernel memory layout of 3Gb+1Gb rather than 2Gb+2Gb ? Have you tried the kernel from 9 months ago at github ska-sa/roach2_nfs_uboot ? regards marc On Wed, Jun 24, 2015 at 11:49 PM, John Ford jf...@nrao.edu wrote: Hi all. We were having problems with multiple sequentail progdev calls failing on our ROACH-2 systems. We were testing multiple bof files in a loop, and the roach would fall over and crash completely, and after the kernel panic, it would reboot itself. After a great deal of concentrated debugging effort this afternoon by Jack, David, Justin, Ryan, Arindam, Randy, and me, the cause of the crashing upon multiple progdev calls was found. It turned out to have nothing to do with programming the chip, rather it was a problem with memory allocation by the operating system. Jack found that problem could also be caused by allocating a huge array in Python, using lots of memory. The problem was caused by the kernel thinking that the ROACH has 768 MB of memory on board, when in fact it has only 512 MB. The fix is to pass the real amount of memory to the kernel in the bootargs. the systems have been mostly working for a long time (Years!), so you may want to check that your systems know in fact how much memory they have. If you start up top you can see what it thinks, or look in /proc/meminfo. John
Re: [casper] ROACH2 one_gbe block losing data on reception
Hi Henno, Thanks for replying. I meant to mention the simulink clock is 250 MHz and the design meets timing. The ultimate goal is loading waveforms into the DRAM since there isn't a CPU DRAM interface for ROACH2 yet, so any speed better than loading via KATCP would be great. But I've tried delays of up to 10 ms between each packet (i.e. down to about 100 KB/s), and am still seeing this behavior. The data loss fraction doesn't seem to improve substantially with reduced data rate. This leads me to suspect some sort of signal integrity issue, but I'm not sure where. Glenn On Jul 28, 2015 12:50 AM, Henno Kriel he...@ska.ac.za wrote: Hi Glen, I assume you are running your fabric 125MHz? What is the data rate you are trying to sustain? HK On Mon, Jul 27, 2015 at 10:06 PM, G Jones glenn.calt...@gmail.com wrote: Hi, I'm experimenting with the one_gbe block on ROACH2. So far data transmission looks flawless, I can capture all the bytes I send. However, receiving data from a computer seems to result in missing data. My design is very simple, I have rx_ack tied high, and then counters on rx_val and the rising edge of rx_eof and also on rx_overrun and rx_badframe. If I just send a single packet, all looks good, the rx_val counter shows the number of bytes I sent, and rx_eof shows one packet received. But when I try to send a sequence of packets, I end up with the rx_val counter showing fewer bytes than I sent, by about ~ 0.5-5% depending on the exact combination of parameters I use in sending packets. Sometimes all the ~1024 packets arrive, as indicated by the rx_eof counter, but other times it seems 1-3 are missing. I see consistent results whether the packet payload is 1024 bytes, 4096 bytes, or 8192 bytes. I never see any counts on the rx_overrun or rx_badframe line. I am using sendall to send the packets from the host computer, and it reports that all bytes are being sent, so I assume it's not dropping them on the way outbound (plus it seems like it would be weird for the host computer to send partial packets). I've tried this test with two different network adapters (both directly connected to the ROACH2 fabric ethernet port, and with various lengths of CAT 6 ethernet cables. Has anyone used the one_gbe block in this way? Am I missing something? Thanks, Glenn -- Kind regards, Henno Kriel Manager: Hardware Engineering SKA South Africa (p) +27 (0)21 506 7374 (direct) (m) +27 (0)84 504 5050 web: www.ska.ac.za
[casper] Using katcp to Communicate to ROACH 2
Hello, Similar to Victor's email earlier today, I am writing to ask for instructions on using a ROACH 2 with a PC terminal. I can communicate with the ROACH 2 via a telnet connection, but I am unable to transfer and execute boffiles to the board. Also, I am unable to complete the tutorials listed at the following link, https://casper.berkeley.edu/wiki/Workshop_2011_Tutorials, because my PC does not recognize the ROACH as a hardware platform. Help with either of these issues would be greatly appreciated. Thank you, Christopher Barnes
Re: [casper] Roach-2 crashing fix
Hi, Marc, On Jul 28, 2015, at 1:34 AM, Marc Welz wrote: So I confess to relying on third parties for this information, but isn't the board populated with 1Gb RAM after all ? When U-Boot starts up it reports that the system has 512 KB of memory. I assume (uh-oh!) that uboot is detecting that size dynamically at run time. Is it possible that later production runs of the ROACH2 were populated with larger capacity memory chips? Would the crash be trigged by a kernel memory layout of 3Gb+1Gb rather than 2Gb+2Gb ? I don't understand this question. Can you please clarify? I don't think it's a layout issue, but rather a size issue. Have you tried the kernel from 9 months ago at github ska-sa/roach2_nfs_uboot ? I'll have to double check the kernel version that we used. Thanks, Dave
Re: [casper] Roach-2 crashing fix
So I confess to relying on third parties for this information, but isn't the board populated with 1Gb RAM after all ? Would the crash be trigged by a kernel memory layout of 3Gb+1Gb rather than 2Gb+2Gb ? Certainly if the kernel thinks the layout of memory isn't what it really is it could crash. We'll look into this a bit more. Have you tried the kernel from 9 months ago at github ska-sa/roach2_nfs_uboot ? No, we haven't, as far as I know. John regards marc On Wed, Jun 24, 2015 at 11:49 PM, John Ford jf...@nrao.edu wrote: Hi all. We were having problems with multiple sequentail progdev calls failing on our ROACH-2 systems. We were testing multiple bof files in a loop, and the roach would fall over and crash completely, and after the kernel panic, it would reboot itself. After a great deal of concentrated debugging effort this afternoon by Jack, David, Justin, Ryan, Arindam, Randy, and me, the cause of the crashing upon multiple progdev calls was found. It turned out to have nothing to do with programming the chip, rather it was a problem with memory allocation by the operating system. Jack found that problem could also be caused by allocating a huge array in Python, using lots of memory. The problem was caused by the kernel thinking that the ROACH has 768 MB of memory on board, when in fact it has only 512 MB. The fix is to pass the real amount of memory to the kernel in the bootargs. the systems have been mostly working for a long time (Years!), so you may want to check that your systems know in fact how much memory they have. If you start up top you can see what it thinks, or look in /proc/meminfo. John
Re: [casper] ROACH2 one_gbe block losing data on reception
Hi Glenn, Everything seems to be in order, however if there was signal integrity issues the MAC (temac) would report bad frames due to CRC failure (the CRC is calculated over the entire Ethernet frame). Just another sanity check would be to send this data to another PC and just confirm that all the packets are received. HK On Tue, Jul 28, 2015 at 9:10 AM, G Jones glenn.calt...@gmail.com wrote: Hi Henno, Thanks for replying. I meant to mention the simulink clock is 250 MHz and the design meets timing. The ultimate goal is loading waveforms into the DRAM since there isn't a CPU DRAM interface for ROACH2 yet, so any speed better than loading via KATCP would be great. But I've tried delays of up to 10 ms between each packet (i.e. down to about 100 KB/s), and am still seeing this behavior. The data loss fraction doesn't seem to improve substantially with reduced data rate. This leads me to suspect some sort of signal integrity issue, but I'm not sure where. Glenn On Jul 28, 2015 12:50 AM, Henno Kriel he...@ska.ac.za wrote: Hi Glen, I assume you are running your fabric 125MHz? What is the data rate you are trying to sustain? HK On Mon, Jul 27, 2015 at 10:06 PM, G Jones glenn.calt...@gmail.com wrote: Hi, I'm experimenting with the one_gbe block on ROACH2. So far data transmission looks flawless, I can capture all the bytes I send. However, receiving data from a computer seems to result in missing data. My design is very simple, I have rx_ack tied high, and then counters on rx_val and the rising edge of rx_eof and also on rx_overrun and rx_badframe. If I just send a single packet, all looks good, the rx_val counter shows the number of bytes I sent, and rx_eof shows one packet received. But when I try to send a sequence of packets, I end up with the rx_val counter showing fewer bytes than I sent, by about ~ 0.5-5% depending on the exact combination of parameters I use in sending packets. Sometimes all the ~1024 packets arrive, as indicated by the rx_eof counter, but other times it seems 1-3 are missing. I see consistent results whether the packet payload is 1024 bytes, 4096 bytes, or 8192 bytes. I never see any counts on the rx_overrun or rx_badframe line. I am using sendall to send the packets from the host computer, and it reports that all bytes are being sent, so I assume it's not dropping them on the way outbound (plus it seems like it would be weird for the host computer to send partial packets). I've tried this test with two different network adapters (both directly connected to the ROACH2 fabric ethernet port, and with various lengths of CAT 6 ethernet cables. Has anyone used the one_gbe block in this way? Am I missing something? Thanks, Glenn -- Kind regards, Henno Kriel Manager: Hardware Engineering SKA South Africa (p) +27 (0)21 506 7374 (direct) (m) +27 (0)84 504 5050 web: www.ska.ac.za -- Kind regards, Henno Kriel Manager: Hardware Engineering SKA South Africa (p) +27 (0)21 506 7374 (direct) (m) +27 (0)84 504 5050 web: www.ska.ac.za
[casper] Communicating to ROACH 2
Hello everybody, I'm writing to ask you some detailed guidance to communicate my linux PC to the Roach 2 board.I have the Simulink project compiled successfully and I'm able to watch the u-boot process on minicom terminal. However, I don't know how to proceed at this point. May anybody provide basic information about this issue? The pieces of information from some tutorials at CASPER website weren't enough for me. Thank you in advance,Victor Cardoso.
Re: [casper] Communicating to ROACH 2
Hello Victor, Are you planning on using telnet or python? If python, check out the casperfpga module on ska-sa Github. It's not documented in the tutorials unfortunately, and it takes a little bit of doing to install, but once you're there it's very easy to us, especially with ipython. I've recently gone through a similar process with ROACH 1, so if you don't come right by tomorrow I (or someone more experienced than I) can give you some more direction tomorrow... Regards, James On Tue, Jul 28, 2015 at 8:57 PM, Victor Cardoso victorcardoso...@hotmail.com wrote: Hello everybody, I'm writing to ask you some detailed guidance to communicate my linux PC to the Roach 2 board. I have the Simulink project compiled successfully and I'm able to watch the u-boot process on minicom terminal. However, I don't know how to proceed at this point. May anybody provide basic information about this issue? The pieces of information from some tutorials at CASPER website weren't enough for me. Thank you in advance, Victor Cardoso.