Re: [casper] Roach-2 crashing fix

2015-07-28 Thread Marc Welz
So I confess to relying on third parties for this information, but isn't
the board populated with 1Gb RAM after all ? Would the crash be trigged by
a kernel memory layout of 3Gb+1Gb rather than  2Gb+2Gb ?

Have you tried the kernel from 9 months ago at github
ska-sa/roach2_nfs_uboot ?

regards

marc


On Wed, Jun 24, 2015 at 11:49 PM, John Ford jf...@nrao.edu wrote:

 Hi all.

 We were having problems with multiple sequentail progdev calls failing on
 our ROACH-2 systems.  We were testing multiple bof files in a loop, and
 the roach would fall over and crash completely, and after the kernel
 panic, it would reboot itself.

 After a great deal of concentrated debugging effort this afternoon by
 Jack, David, Justin, Ryan, Arindam, Randy, and me, the cause of the
 crashing upon multiple progdev calls was found.  It turned out to have
 nothing to do with programming the chip, rather it was a problem with
 memory allocation by the operating system.  Jack found that problem could
 also be caused by allocating a huge array in Python, using lots of memory.

 The problem was caused by the kernel thinking that the ROACH has 768 MB of
 memory on board, when in fact it has only 512 MB.  The fix is to pass the
 real amount of memory to the kernel in the bootargs.  the systems have
 been mostly working for a long time (Years!), so you may want to check
 that your systems know in fact how much memory they have.  If you start up
 top you can see what it thinks, or look in /proc/meminfo.

 John








Re: [casper] ROACH2 one_gbe block losing data on reception

2015-07-28 Thread G Jones
Hi Henno,
Thanks for replying. I meant to mention the simulink clock is 250 MHz and
the design meets timing. The ultimate goal is loading waveforms into the
DRAM since there isn't a CPU DRAM interface for ROACH2 yet, so any speed
better than loading via KATCP would be great. But I've tried delays of up
to 10 ms between each packet (i.e. down to about 100 KB/s), and am still
seeing this behavior. The data loss fraction doesn't seem to improve
substantially with reduced data rate. This leads me to suspect some sort of
signal integrity issue, but I'm not sure where.

Glenn
On Jul 28, 2015 12:50 AM, Henno Kriel he...@ska.ac.za wrote:

 Hi Glen,

 I assume you are running your fabric  125MHz? What is the data rate you
 are trying to sustain?

 HK

 On Mon, Jul 27, 2015 at 10:06 PM, G Jones glenn.calt...@gmail.com wrote:

 Hi,
 I'm experimenting with the one_gbe block on ROACH2. So far data
 transmission looks flawless, I can capture all the bytes I send. However,
 receiving data from a computer seems to result in missing data.
 My design is very simple, I have rx_ack tied high, and then counters on
 rx_val and the rising edge of rx_eof and also on rx_overrun and rx_badframe.

 If I just send a single packet, all looks good, the rx_val counter shows
 the number of bytes I sent, and rx_eof shows one packet received. But when
 I try to send a sequence of packets, I end up with the rx_val counter
 showing fewer bytes than I sent, by about ~ 0.5-5% depending on the exact
 combination of parameters I use in sending packets. Sometimes all the ~1024
 packets arrive, as indicated by the rx_eof counter, but other times it
 seems 1-3 are missing.

  I see consistent results whether the packet payload is 1024 bytes, 4096
 bytes, or 8192 bytes.

 I never see any counts on the rx_overrun or rx_badframe line.

 I am using sendall to send the packets from the host computer, and it
 reports that all bytes are being sent, so I assume it's not dropping them
 on the way outbound (plus it seems like it would be weird for the host
 computer to send partial packets).

 I've tried this test with two different network adapters (both directly
 connected to the ROACH2 fabric ethernet port, and with various lengths of
 CAT 6 ethernet cables.

 Has anyone used the one_gbe block in this way? Am I missing something?

 Thanks,
 Glenn




 --
 Kind regards,
 Henno Kriel

 Manager: Hardware Engineering
 SKA South Africa
 (p) +27 (0)21 506 7374 (direct)
 (m) +27 (0)84 504 5050
 web: www.ska.ac.za



[casper] Using katcp to Communicate to ROACH 2

2015-07-28 Thread Christopher Barnes
Hello,

Similar to Victor's email earlier today, I am writing to ask for
instructions on using a ROACH 2 with a PC terminal.  I can communicate with
the ROACH 2 via a telnet connection, but I am unable to transfer and
execute boffiles to the board.  Also, I am unable to complete the tutorials
listed at the following link,
https://casper.berkeley.edu/wiki/Workshop_2011_Tutorials, because my PC
does not recognize the ROACH as a hardware platform.  Help with either of
these issues would be greatly appreciated.

Thank you,

Christopher Barnes


Re: [casper] Roach-2 crashing fix

2015-07-28 Thread David MacMahon
Hi, Marc,

On Jul 28, 2015, at 1:34 AM, Marc Welz wrote:

 So I confess to relying on third parties for this information, but isn't the 
 board populated with 1Gb RAM after all ? 

When U-Boot starts up it reports that the system has 512 KB of memory.  I 
assume (uh-oh!) that uboot is detecting that size dynamically at run time.  Is 
it possible that later production runs of the ROACH2 were populated with larger 
capacity memory chips?

 Would the crash be trigged by a kernel memory layout of 3Gb+1Gb rather than  
 2Gb+2Gb ?

I don't understand this question.  Can you please clarify?  I don't think it's 
a layout issue, but rather a size issue.

 Have you tried the kernel from 9 months ago at github ska-sa/roach2_nfs_uboot 
 ? 

I'll have to double check the kernel version that we used.

Thanks,
Dave




Re: [casper] Roach-2 crashing fix

2015-07-28 Thread John Ford
 So I confess to relying on third parties for this information, but isn't
 the board populated with 1Gb RAM after all ? Would the crash be trigged by
 a kernel memory layout of 3Gb+1Gb rather than  2Gb+2Gb ?

Certainly if the kernel thinks the layout of memory isn't what it really
is it could crash.  We'll look into this a bit more.


 Have you tried the kernel from 9 months ago at github
 ska-sa/roach2_nfs_uboot ?

No, we haven't, as far as I know.

John


 regards

 marc


 On Wed, Jun 24, 2015 at 11:49 PM, John Ford jf...@nrao.edu wrote:

 Hi all.

 We were having problems with multiple sequentail progdev calls failing
 on
 our ROACH-2 systems.  We were testing multiple bof files in a loop, and
 the roach would fall over and crash completely, and after the kernel
 panic, it would reboot itself.

 After a great deal of concentrated debugging effort this afternoon by
 Jack, David, Justin, Ryan, Arindam, Randy, and me, the cause of the
 crashing upon multiple progdev calls was found.  It turned out to have
 nothing to do with programming the chip, rather it was a problem with
 memory allocation by the operating system.  Jack found that problem
 could
 also be caused by allocating a huge array in Python, using lots of
 memory.

 The problem was caused by the kernel thinking that the ROACH has 768 MB
 of
 memory on board, when in fact it has only 512 MB.  The fix is to pass
 the
 real amount of memory to the kernel in the bootargs.  the systems have
 been mostly working for a long time (Years!), so you may want to check
 that your systems know in fact how much memory they have.  If you start
 up
 top you can see what it thinks, or look in /proc/meminfo.

 John












Re: [casper] ROACH2 one_gbe block losing data on reception

2015-07-28 Thread Henno Kriel
Hi Glenn,

Everything seems to be in order, however if there was signal integrity
issues the MAC (temac) would report bad frames due to CRC failure (the CRC
is calculated over the entire Ethernet frame).

Just another sanity check would be to send this data to another PC and just
confirm that all the packets are received.

HK

On Tue, Jul 28, 2015 at 9:10 AM, G Jones glenn.calt...@gmail.com wrote:

 Hi Henno,
 Thanks for replying. I meant to mention the simulink clock is 250 MHz and
 the design meets timing. The ultimate goal is loading waveforms into the
 DRAM since there isn't a CPU DRAM interface for ROACH2 yet, so any speed
 better than loading via KATCP would be great. But I've tried delays of up
 to 10 ms between each packet (i.e. down to about 100 KB/s), and am still
 seeing this behavior. The data loss fraction doesn't seem to improve
 substantially with reduced data rate. This leads me to suspect some sort of
 signal integrity issue, but I'm not sure where.

 Glenn
 On Jul 28, 2015 12:50 AM, Henno Kriel he...@ska.ac.za wrote:

 Hi Glen,

 I assume you are running your fabric  125MHz? What is the data rate you
 are trying to sustain?

 HK

 On Mon, Jul 27, 2015 at 10:06 PM, G Jones glenn.calt...@gmail.com
 wrote:

 Hi,
 I'm experimenting with the one_gbe block on ROACH2. So far data
 transmission looks flawless, I can capture all the bytes I send. However,
 receiving data from a computer seems to result in missing data.
 My design is very simple, I have rx_ack tied high, and then counters on
 rx_val and the rising edge of rx_eof and also on rx_overrun and rx_badframe.

 If I just send a single packet, all looks good, the rx_val counter shows
 the number of bytes I sent, and rx_eof shows one packet received. But when
 I try to send a sequence of packets, I end up with the rx_val counter
 showing fewer bytes than I sent, by about ~ 0.5-5% depending on the exact
 combination of parameters I use in sending packets. Sometimes all the ~1024
 packets arrive, as indicated by the rx_eof counter, but other times it
 seems 1-3 are missing.

  I see consistent results whether the packet payload is 1024 bytes, 4096
 bytes, or 8192 bytes.

 I never see any counts on the rx_overrun or rx_badframe line.

 I am using sendall to send the packets from the host computer, and it
 reports that all bytes are being sent, so I assume it's not dropping them
 on the way outbound (plus it seems like it would be weird for the host
 computer to send partial packets).

 I've tried this test with two different network adapters (both directly
 connected to the ROACH2 fabric ethernet port, and with various lengths of
 CAT 6 ethernet cables.

 Has anyone used the one_gbe block in this way? Am I missing something?

 Thanks,
 Glenn




 --
 Kind regards,
 Henno Kriel

 Manager: Hardware Engineering
 SKA South Africa
 (p) +27 (0)21 506 7374 (direct)
 (m) +27 (0)84 504 5050
 web: www.ska.ac.za




-- 
Kind regards,
Henno Kriel

Manager: Hardware Engineering
SKA South Africa
(p) +27 (0)21 506 7374 (direct)
(m) +27 (0)84 504 5050
web: www.ska.ac.za


[casper] Communicating to ROACH 2

2015-07-28 Thread Victor Cardoso
Hello everybody,
I'm writing to ask you some detailed guidance to communicate my linux PC to the 
Roach 2 board.I have the Simulink project compiled successfully and I'm able to 
watch the u-boot process on minicom terminal. However, I don't know how to 
proceed at this point.
May anybody provide basic information about this issue? The pieces of 
information from some tutorials at CASPER website weren't enough for me.  

Thank you in advance,Victor Cardoso.  

Re: [casper] Communicating to ROACH 2

2015-07-28 Thread James Smith
Hello Victor,

Are you planning on using telnet or python? If python, check out the
casperfpga module on ska-sa Github. It's not documented in the tutorials
unfortunately, and it takes a little bit of doing to install, but once
you're there it's very easy to us, especially with ipython.

I've recently gone through a similar process with ROACH 1, so if you don't
come right by tomorrow I (or someone more experienced than I) can give you
some more direction tomorrow...

Regards,
James


On Tue, Jul 28, 2015 at 8:57 PM, Victor Cardoso 
victorcardoso...@hotmail.com wrote:

 Hello everybody,

 I'm writing to ask you some detailed guidance to communicate my linux PC
 to the Roach 2 board.
 I have the Simulink project compiled successfully and I'm able to watch
 the u-boot process on minicom terminal. However, I don't know how to
 proceed at this point.
 May anybody provide basic information about this issue? The pieces of
 information from some tutorials at CASPER website weren't enough for me.

 Thank you in advance,
 Victor Cardoso.