Hi Dave,
All neat ideas, but I'm pretty sure that at least the CASPER
implementation of the FPGA side of the EPB bus does not support
either burst mode or DMA.
Even if the FPGA does not implement burst-mode, the back-to-back
reads implemented at the FPGA will have more wait-states than
optimal, I'm still sure the FPGA<->PowerPC transfer rate will
improve, eg., 16-bits at 33MHz with 4 clock periods per
read or write would be 16MB/s, 8 clocks per read would
be 8MB/s ... etc.
The minimum effort required to see if this is worth pursuing is
to determine the "best case" scenario for data transfer given
the current FPGA bus interface implementation, i.e., determine how
slow the FPGA<->PPC interface is. If you have a simulation of that
interface, you can run it, and have your answer with very little
effort. If you don't, or can't be bothered to find it and learn how
to run it under Modelsim, you can do a hardware test. Boot the
board, stop in U-Boot, write to the DMA registers directly, and
probe the bus to see what the maximum rate possible is ... then
you can decide whether to write a device driver to use the DMA
controller under Linux.
If Ross has never had to do this before, he might not have
appreciated where to look/where to start.
Cheers,
Dave
On Oct 16, 2013, at 9:13 AM, David Hawkins wrote:
Hey Ross,
Thanks everyone for the input.
I've been trying not to move towards the 10 gb option if possible
for a couple of reasons but I might have to bite the bullet.
I'm also going to look at another option which is with the mmap
kernel I have access to the FPGA memory in /dev/roach/mem. I
think it should be fairly easy to write a custom server that runs
on the roach that collects the bram data from the mem and then
buffers if for sending over ethernet via udp. This is probably
very similar to what the UDP option in tcpborphserver does but
I'll be able to control the amount of buffering
I guess you could call it a replacement for tcpborphserver that
is specific for my application.
Assuming the bottleneck in your datapath is the FPGA<->PowerPC
interface ...
If the ROACH interface is using /dev/mem for access, then the Power
CPU core is performing the reads; in general, a CPU core will only
ever issue single reads to the external bus (unless you mark the
memory as cacheable, or perform a double-precision floating-point
read). If you want to improve speed, then you need to use DMA to
move data from the local bus to memory, and from there to the
network.
When I was looking at the 440EP, I ran some DMA tests to the PCI
connector on the Yosemite evaluation board ...
http://www.ovro.caltech.edu/~dwh/powerpc_440ep.pdf
You could do is to run some DMA tests to the local bus and probe
the bus with a scope to see that you are getting back-to-back
reads.
Assuming the ROACH interface to the PowerPC local bus is 16-bits
at 33MHz, you should be able to approach 50MB/s transfer rates.
The FPGA-side of the local bus would need to implement burst
transactions; I don't recall the details of this processors local
bus (I ended up using a Freescale PowerPC instead).
Cheers, Dave