Hi all,
I would be very interested in an example of DMA transfer from PL to PS
in CASPER. I am working on Knowledge Resources module that has 8 GB both
in PL and PS, 8 GB readable buffer in PL would be very useful.
Cheers,
Kaj
On 5.10.2023 18.05, Matthew Schiller wrote:
Yeah depends on what you want to do with the data. If the ARM is
further processing the data, than DMA usually makes sense, because the
ARM can access it’s memory more quickly and use the L1/L2 Cache for data
in the PS memory, plus avoiding the ARM spending process clock cycles
reading data from the PL and copying it into the PS memory (which is
likely what will happen in some processing, say if you were executing an
FFT in software or trying to generate an ethernet packet in software to
send the data to a user display)
One copy isn’t too bad, but it can get ridiculous if you make more than
one copy. But there’s also how the copy is done… What you almost never
want to happen is a memcopy done via software for loop or similar,
because if you aren’t extremely careful coding that up in software it
won’t even use the ARM DMA blocks buried in the processor to do it. So
if your software application end up reading from the PL memory and
writing to another block of memory (eg a software defined array
variable) (in the PS) that’s probably a sign that DMA is the right thing
to do, so the ARM isn’t spending processor cycles just copying data.
But whenever you do something with DMA… Your complexity just shot
through the roof, even if it can technically net performance gains.
But there are certainly cases were there is no appreciable performance
gain, and the complexity is not warranted. Eg if you aren’t heavily
using the ARM processor for processing and just desperately want to read
some I/Q data into a file on disk. You might not care about the fact
that the processor is spinning reading data in that case. Especially
since using DMA effectively in Linux for “direct storage” to a nvme
drive or SD card is well that’s really only something the industry is
just starting to do on X86 machines (eg Direct Storage with GPUs), so
you probably are going to be forced to let software handle that anyway
on ARM rather than writing a very complicated driver-level code to do it….
Gah complicated software stuff galore in that discussion… Because
properly handling data movement in embedded (software) systems for
highest performance is well complicated…. And I probably shouldn’t talk
much about it… I’m an FPGA engineer not a Kernel-level Embedded software
engineer..
AB72FAB9
Matthew Schiller
ngVLA Digital Backend Lead
NRAO
mschi...@nrao.edu <mailto:mschi...@nrao.edu>
315-316-2032
Matthew Schiller
*From:* 'Ross Martin' via casper@lists.berkeley.edu
<casper@lists.berkeley.edu>
*Sent:* Thursday, October 5, 2023 10:46 AM
*To:* casper <casper@lists.berkeley.edu>
*Subject:* Re: [casper] PL data to PS DDR4 (AXI) {External}
DMA isn't always the best answer.
It's sometimes best to just leave the data in the PL and have the
processor access it directly.
If the processor reads the data directly, it's just accessed once, and
only the data you need is accessed.
If you transfer via DMA, it's read once by the DMA from the PL, written
to PS memory once by the DMA, and then read again by the processor to do
it's processing. Also, the DMA must potentially transfer more data than
the current processing actually needs, since it may need to account for
contingencies.
So although DMA access *might* be faster access, it's definitely
accessing the data more times. It won't always be worth it.
DMA also adds an additional layer of software complexity.
As an example of the non-DMA solution, the demo I released for the
RFSoC4x2 pulls the data directly from the PL into the ARM without doing
any DMA.
Regards,
Ross
On Thu, Oct 5, 2023, 3:06 PM Jack Hickish <jackhick...@gmail.com
<mailto:jackhick...@gmail.com>> wrote:
This seems like a fun "discuss at the workshop" topic!
I have a couple of applications where I think this functionality
would be useful, so I'd definitely be interested in helping out.
From a toolflow side I think getting the automated instantiation of
the DMA IP should be relatively straightforward. Handling what the
CPU does to interact with the core, and/or how you might interact
with the core remotely over a network I'm less sure about.
Cheers
Jack
On Thu, 5 Oct 2023 at 12:08, Matthew Schiller <mschi...@nrao.edu
<mailto:mschi...@nrao.edu>> wrote:
The right way to do what you describe is with the axi DMA block,
but as you point out that has a software interface to configure
the transfer. The main data would flow over an AXI4 “full”
interface that supports burst transactions (but the
Xilinx-provided DMA block already does that), and the
configuration of the DMA block comes from software over AXI4
lite. There are two approaches (which should be supported by
either using the correct DMA block or the correct settings on
the DMA block). A Standard DMA block can be used if fixed
addresses in memory can be allocated. This would mean that the
linux kernel is told to only use ½ of the PS memory for
example. Software can still access the upper half though for
example /dev/mem reads, but the upper memory disappears from
linux for normal applications.. Alternatively, though more
complicated, a “scatter-gather” DMA is implemented. A Scatter
Gather DMA uses a software driver/server that will “malloc”
memory in a normal software way, and then provide pointers to
the Scatter Gather DMA to that memory. Because of the way
virtual memory works, this is not as trivial as it sounds and is
requires several steps to accomplish as the FPGA needs the
physical, not virtual address, and must respect the fact that
memory is allocated in virtual memory on “pages” and not
necessarily contiguously.
sgDMA is better in many systems though because linux can still
access all the memory so if you aren’t recording data, for
example, more complicated software applications can run.
I don’t believe this has been done yet in casper, but it is
possible since these are standard Xilinx provided blocks. We
just need to get the block instantiated properly in sysgen to
accept an AXI streaming data stream from your DSP algorithm or
the ADCs. and then on the ARM processor we need appropriate
software/drivers to allocate memory and configure the DMA.
I think I heard a rumor that it was planned, but hasn’t been
tackled yet.
With AXIDMA, you can probably get to around 20Gbit/sec (in
theory probably as high as 40 depending on what speed the DDR4
train to) or better transfer performance to the PL. Not that
the little arm on these FPGAs can do much with that speed of
data, but for recording a snippit of data or something like that
that can allow some fairly significant sample rates of I/Q data
for example. (at 8-bit I/Q that’s >1GSPS!). If instead you did
the register approach you mentioned I would expect rates around
100Mbit/sec to be possible, and to achieve that the processor in
the ARM will be going nuts, because AXI4-Lite tends to require
the processor to spin (DMA frees the processor for other stuff,
while polling registers takes time to accomplish)
FWIW: ngVLA plans to create functionality like this in “pure”
hdl and given the current effort to use more VHDL/Verilog blocks
in casper ngVLA’s work may be useful in the future. I hope to
make progress on ngVLA’s approach later this calendar year. But
ngVLA is on Intel FPGAs so a porting process would still be
required to get that into Casper.
*From:* casper@lists.berkeley.edu
<mailto:casper@lists.berkeley.edu> <casper@lists.berkeley.edu
<mailto:casper@lists.berkeley.edu>> *On Behalf Of *Ken Semanov
*Sent:* Thursday, October 5, 2023 4:11 AM
*To:* casper@lists.berkeley.edu <mailto:casper@lists.berkeley.edu>
*Subject:* [casper] PL data to PS DDR4 (AXI) {External}
Is there an obvious way to migrate data from the PL into memory
that is mapped into the address space of the PS? Ideally I
would use axi_interconnect as shown
https://casper-toolflow.readthedocs.io/en/latest/axi4lite_documentation.html
<https://casper-toolflow.readthedocs.io/en/latest/axi4lite_documentation.html>
A possible approach is to instantiate axi_dma within the PL ,
and the PL acts as the master during transfers. But the axi_dma
exposes a AXI4-Lite slave port to the PS so that the PS
configures and starts the transfers. The receiving raw device
would be the memory controller of the PS DDR4. (Presumably
the data is accessed later by software via the DMA engine).
Another approach would be to expose a single register, and
perform this slowly word-by-word (without streaming or bursting.)
Is this plausible in CASPER, or are steep changes required?
--
You received this message because you are subscribed to the
Google Groups "casper@lists.berkeley.edu
<mailto:casper@lists.berkeley.edu>" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to casper+unsubscr...@lists.berkeley.edu
<mailto:casper+unsubscr...@lists.berkeley.edu>.
To view this discussion on the web visit
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/0424800a-035f-447f-92ed-07402b9d0239n%40lists.berkeley.edu
<https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/0424800a-035f-447f-92ed-07402b9d0239n%40lists.berkeley.edu?utm_medium=email&utm_source=footer>.
--
You received this message because you are subscribed to the
Google Groups "casper@lists.berkeley.edu
<mailto:casper@lists.berkeley.edu>" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to casper+unsubscr...@lists.berkeley.edu
<mailto:casper+unsubscr...@lists.berkeley.edu>.
To view this discussion on the web visit
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/BL0PR14MB352338AEAFC2A58D667699CEABCAA%40BL0PR14MB3523.namprd14.prod.outlook.com
<https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/BL0PR14MB352338AEAFC2A58D667699CEABCAA%40BL0PR14MB3523.namprd14.prod.outlook.com?utm_medium=email&utm_source=footer>.
--
You received this message because you are subscribed to the Google
Groups "casper@lists.berkeley.edu
<mailto:casper@lists.berkeley.edu>" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to casper+unsubscr...@lists.berkeley.edu
<mailto:casper+unsubscr...@lists.berkeley.edu>.
To view this discussion on the web visit
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAG1GKSnF8Bc7bfMajOVBEqdgdkhp2q3DiikSLRz4jpQX--1RCg%40mail.gmail.com
<https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAG1GKSnF8Bc7bfMajOVBEqdgdkhp2q3DiikSLRz4jpQX--1RCg%40mail.gmail.com?utm_medium=email&utm_source=footer>.
--
You received this message because you are subscribed to the Google
Groups "casper@lists.berkeley.edu <mailto:casper@lists.berkeley.edu>" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to casper+unsubscr...@lists.berkeley.edu
<mailto:casper+unsubscr...@lists.berkeley.edu>.
To view this discussion on the web visit
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAG4nf730H6Hy%2B1krgq%2BRSYaBtVq8YHPikqz2ddoa61t4nEi0OA%40mail.gmail.com <https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAG4nf730H6Hy%2B1krgq%2BRSYaBtVq8YHPikqz2ddoa61t4nEi0OA%40mail.gmail.com?utm_medium=email&utm_source=footer>.
--
You received this message because you are subscribed to the Google
Groups "casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to casper+unsubscr...@lists.berkeley.edu
<mailto:casper+unsubscr...@lists.berkeley.edu>.
To view this discussion on the web visit
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/BL0PR14MB3523893523EA0FBFBBE04949ABCAA%40BL0PR14MB3523.namprd14.prod.outlook.com <https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/BL0PR14MB3523893523EA0FBFBBE04949ABCAA%40BL0PR14MB3523.namprd14.prod.outlook.com?utm_medium=email&utm_source=footer>.
--
You received this message because you are subscribed to the Google Groups
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/e4b55951-a2cb-f72b-55fb-ddfbea5de260%40utu.fi.