Hi all,

I would be very interested in an example of DMA transfer from PL to PS in CASPER. I am working on Knowledge Resources module that has 8 GB both in PL and PS, 8 GB readable buffer in PL would be very useful.

Cheers,
Kaj

On 5.10.2023 18.05, Matthew Schiller wrote:
Yeah depends on what you want to do with the data.  If the ARM is further processing the data, than DMA usually makes sense, because the ARM can access it’s memory more quickly and use the L1/L2 Cache for data in the PS memory, plus avoiding the ARM spending process clock cycles reading data from the PL and copying it into the PS memory (which is likely what will happen in some processing, say if you were executing an FFT in software or trying to generate an ethernet packet in software to send the data to a user display)


One copy isn’t too bad, but it can get ridiculous if you make more than one copy.  But there’s also how the copy is done… What you almost never want to happen is a memcopy done via software for loop or similar, because if you aren’t extremely careful coding that up in software it won’t even use the ARM DMA blocks buried in the processor to do it.  So if your software application end up reading from the PL memory and writing to another block of memory (eg a software defined array variable) (in the PS) that’s probably a sign that DMA is the right thing to do, so the ARM isn’t spending processor cycles just copying data. But whenever you do something with DMA… Your complexity just shot through the roof, even if it can technically net performance gains.

But there are certainly cases were there is no appreciable performance gain, and the complexity is not warranted.  Eg if you aren’t heavily using the ARM processor for processing and just desperately want to read some I/Q data into a file on disk.   You might not care about the fact that the processor is spinning reading data in that case.  Especially since using DMA effectively in Linux for “direct storage” to a nvme drive or SD card is well that’s really only something the industry is just starting to do on X86 machines (eg Direct Storage with GPUs), so you probably are going to be forced to let software handle that anyway on ARM rather than writing a very complicated driver-level code to do it….

Gah complicated software stuff galore in that discussion…  Because properly handling data movement in embedded (software) systems for highest performance is well complicated….  And I probably shouldn’t talk much about it… I’m an FPGA engineer not a Kernel-level Embedded software engineer..

AB72FAB9

        

Matthew Schiller

ngVLA Digital Backend Lead

NRAO

mschi...@nrao.edu <mailto:mschi...@nrao.edu>

315-316-2032

Matthew Schiller

*From:* 'Ross Martin' via casper@lists.berkeley.edu <casper@lists.berkeley.edu>
*Sent:* Thursday, October 5, 2023 10:46 AM
*To:* casper <casper@lists.berkeley.edu>
*Subject:* Re: [casper] PL data to PS DDR4 (AXI) {External}

DMA isn't always the best answer.

It's sometimes best to just leave the data in the PL and have the processor access it directly.

If the processor reads the data directly, it's just accessed once, and only the data you need is accessed.

If you transfer via DMA, it's read once by the DMA from the PL, written to PS memory once by the DMA, and then read again by the processor to do it's processing.  Also, the DMA must potentially transfer more data than the current processing actually needs, since it may need to account for contingencies.

So although DMA access *might* be faster access, it's definitely accessing the data more times.  It won't always be worth it.

DMA also adds an additional layer of software complexity.

As an example of the non-DMA solution, the demo I released for the RFSoC4x2 pulls the data directly from the PL into the ARM without doing any DMA.

Regards,

Ross

On Thu, Oct 5, 2023, 3:06 PM Jack Hickish <jackhick...@gmail.com <mailto:jackhick...@gmail.com>> wrote:

    This seems like a fun "discuss at the workshop" topic!

    I have a couple of applications where I think this functionality
    would be useful, so I'd definitely be interested in helping out.

     From a toolflow side I think getting the automated instantiation of
    the DMA IP should be relatively straightforward. Handling what the
    CPU does to interact with the core, and/or how you might interact
    with the core remotely over a network I'm less sure about.

    Cheers

    Jack

    On Thu, 5 Oct 2023 at 12:08, Matthew Schiller <mschi...@nrao.edu
    <mailto:mschi...@nrao.edu>> wrote:

        The right way to do what you describe is with the axi DMA block,
        but as you point out that has a software interface to configure
        the transfer.  The main data would flow over an AXI4 “full”
        interface that supports burst transactions (but the
        Xilinx-provided DMA block already does that), and the
        configuration of the DMA block comes from software over AXI4
        lite. There are two approaches (which should be supported by
        either using the correct DMA block or the correct settings on
        the DMA block).  A Standard DMA block can be used if fixed
        addresses in memory can be allocated.  This would mean that the
        linux kernel is told to only use ½ of the PS memory for
        example.  Software can still access the upper half though for
        example /dev/mem reads, but the upper memory disappears from
        linux for normal applications..  Alternatively, though more
        complicated, a “scatter-gather” DMA is implemented.  A Scatter
        Gather DMA uses a software driver/server that will “malloc”
        memory in a normal software way, and then provide pointers to
        the Scatter Gather DMA to that memory.  Because of the way
        virtual memory works, this is not as trivial as it sounds and is
        requires several steps to accomplish as the FPGA needs the
        physical, not virtual address, and must respect the fact that
        memory is allocated in virtual memory on “pages” and not
        necessarily contiguously.

        sgDMA is better in many systems though because linux can still
        access all the memory so if you aren’t recording data, for
        example, more complicated software applications can run.

        I don’t believe this has been done yet in casper, but it is
        possible since these are standard Xilinx provided blocks.   We
        just need to get the block instantiated properly in sysgen to
        accept an AXI streaming data stream from your DSP algorithm or
        the ADCs. and then on the ARM processor we need appropriate
        software/drivers to allocate memory and configure the DMA.

        I think I heard a rumor that it was planned, but hasn’t been
        tackled yet.

        With AXIDMA, you can probably get to around 20Gbit/sec (in
        theory probably as high as 40 depending on what speed the DDR4
        train to) or better transfer performance to the PL.  Not that
        the little arm on these FPGAs can do much with that speed of
        data, but for recording a snippit of data or something like that
        that can allow some fairly significant sample rates of I/Q data
        for example.  (at 8-bit I/Q that’s >1GSPS!).  If instead you did
        the register approach you mentioned I would expect rates around
        100Mbit/sec to be possible, and to achieve that the processor in
        the ARM will be going nuts, because AXI4-Lite tends to require
        the processor to spin (DMA frees the processor for other stuff,
        while polling registers takes time to accomplish)

        FWIW: ngVLA plans to create functionality like this in “pure”
        hdl and given the current effort to use more VHDL/Verilog blocks
        in casper ngVLA’s work may be useful in the future. I hope to
        make progress on ngVLA’s approach later this calendar year. But
        ngVLA is on Intel FPGAs so a porting process would still be
        required to get that into Casper.

        *From:* casper@lists.berkeley.edu
        <mailto:casper@lists.berkeley.edu> <casper@lists.berkeley.edu
        <mailto:casper@lists.berkeley.edu>> *On Behalf Of *Ken Semanov
        *Sent:* Thursday, October 5, 2023 4:11 AM
        *To:* casper@lists.berkeley.edu <mailto:casper@lists.berkeley.edu>
        *Subject:* [casper] PL data to PS DDR4 (AXI) {External}

        Is there an obvious way to migrate data from the PL into memory
        that is mapped into the address space of the PS?   Ideally I
        would use axi_interconnect as shown
        https://casper-toolflow.readthedocs.io/en/latest/axi4lite_documentation.html 
<https://casper-toolflow.readthedocs.io/en/latest/axi4lite_documentation.html>

        A possible approach is to instantiate axi_dma within the PL ,
        and the PL acts as the master during transfers. But the axi_dma
        exposes a AXI4-Lite slave port to the PS so that the PS
        configures and starts the transfers.   The receiving raw device
        would be the memory controller of the PS DDR4.     (Presumably
        the data is accessed later by software via the DMA engine).

        Another approach would be to expose a single register, and
        perform this slowly word-by-word (without streaming or bursting.)

        Is this plausible in CASPER,  or are steep changes required?

-- You received this message because you are subscribed to the
        Google Groups "casper@lists.berkeley.edu
        <mailto:casper@lists.berkeley.edu>" group.
        To unsubscribe from this group and stop receiving emails from
        it, send an email to casper+unsubscr...@lists.berkeley.edu
        <mailto:casper+unsubscr...@lists.berkeley.edu>.
        To view this discussion on the web visit
        
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/0424800a-035f-447f-92ed-07402b9d0239n%40lists.berkeley.edu
 
<https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/0424800a-035f-447f-92ed-07402b9d0239n%40lists.berkeley.edu?utm_medium=email&utm_source=footer>.

-- You received this message because you are subscribed to the
        Google Groups "casper@lists.berkeley.edu
        <mailto:casper@lists.berkeley.edu>" group.
        To unsubscribe from this group and stop receiving emails from
        it, send an email to casper+unsubscr...@lists.berkeley.edu
        <mailto:casper+unsubscr...@lists.berkeley.edu>.
        To view this discussion on the web visit
        
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/BL0PR14MB352338AEAFC2A58D667699CEABCAA%40BL0PR14MB3523.namprd14.prod.outlook.com
 
<https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/BL0PR14MB352338AEAFC2A58D667699CEABCAA%40BL0PR14MB3523.namprd14.prod.outlook.com?utm_medium=email&utm_source=footer>.

-- You received this message because you are subscribed to the Google
    Groups "casper@lists.berkeley.edu
    <mailto:casper@lists.berkeley.edu>" group.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to casper+unsubscr...@lists.berkeley.edu
    <mailto:casper+unsubscr...@lists.berkeley.edu>.
    To view this discussion on the web visit
    
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAG1GKSnF8Bc7bfMajOVBEqdgdkhp2q3DiikSLRz4jpQX--1RCg%40mail.gmail.com
 
<https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAG1GKSnF8Bc7bfMajOVBEqdgdkhp2q3DiikSLRz4jpQX--1RCg%40mail.gmail.com?utm_medium=email&utm_source=footer>.

--
You received this message because you are subscribed to the Google Groups "casper@lists.berkeley.edu <mailto:casper@lists.berkeley.edu>" group. To unsubscribe from this group and stop receiving emails from it, send an email to casper+unsubscr...@lists.berkeley.edu <mailto:casper+unsubscr...@lists.berkeley.edu>. To view this discussion on the web visit https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAG4nf730H6Hy%2B1krgq%2BRSYaBtVq8YHPikqz2ddoa61t4nEi0OA%40mail.gmail.com <https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAG4nf730H6Hy%2B1krgq%2BRSYaBtVq8YHPikqz2ddoa61t4nEi0OA%40mail.gmail.com?utm_medium=email&utm_source=footer>.

--
You received this message because you are subscribed to the Google Groups "casper@lists.berkeley.edu" group. To unsubscribe from this group and stop receiving emails from it, send an email to casper+unsubscr...@lists.berkeley.edu <mailto:casper+unsubscr...@lists.berkeley.edu>. To view this discussion on the web visit https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/BL0PR14MB3523893523EA0FBFBBE04949ABCAA%40BL0PR14MB3523.namprd14.prod.outlook.com <https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/BL0PR14MB3523893523EA0FBFBBE04949ABCAA%40BL0PR14MB3523.namprd14.prod.outlook.com?utm_medium=email&utm_source=footer>.

--
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/e4b55951-a2cb-f72b-55fb-ddfbea5de260%40utu.fi.

Reply via email to