Re: usb_pc_cpu_flush

Ian Lepore Wed, 14 Jan 2015 07:04:42 -0800

On Wed, 2015-01-14 at 06:13 +0100, Hans Petter Selasky wrote:
> On 01/13/15 21:25, Ian Lepore wrote:
> > On Tue, 2015-01-13 at 17:52 +0100, Hans Petter Selasky wrote:
> >> On 01/13/15 17:29, Ian Lepore wrote:
> >>> On Tue, 2015-01-13 at 16:57 +0100, Hans Petter Selasky wrote:
[...]
> >>
> >> Hi,
> >>
> >> Can you give an example of a NIC driver which you consider a good
> >> example which use DMA both for data (not only mbufs) and the control
> >> path and use busdma as you consider to be correct?
> >>
> >> --HPS
> >
> > dev/ffec/if_ffec.c.  I'm not happy with the fact that it takes two calls
> > (a PRE and a POST) to accomplish a single action, but that's the right
> > way to do things in the current busdma world, PRE and POST operations
> > need to be paired.
> >
> > I think we need new busdma support for shared concurrent access
> > descriptors, because it's a type of dma that just isn't supported well
> > by the existing API.  There should be a new flag for bus_dmamem_alloc()
> > that indicates the memory is going to be used for such shared access
> > (because some platforms may be able to provide memory that's mapped
> > optimally for such situations), and there should be new sync ops that
> > don't require a pair of calls to accomplish a single action.
> >
> > All of this is in the context of shared descriptor memory.  Regular IO
> > buffers should just use the proper sequence of PRE and POST syncs (and
> > most especially should *never* do POSTREAD before PREREAD like the
> > current usb_pc_cpu_invalidate() does, because with bounce buffers that
> > will just not work right).
> >
> 
> Hi,
> 
> I understand and that can be done for IO-buffers like in the FFEC 
> driver. For other buffers it is a bigger task, however:
> 
> I see that some memory is allocated using "BUS_DMA_COHERENT" in 
> if_ffec.c and in those cases no busdma sync operations are performed! Is 
> that correct? For example "rxdesc_ring" is allocated coherently and no 
> cache sync ops are done before and after access.
>


No, it does do sync operations on the descriptors rings, because one of
the documented rules of coherent memory is that you still have to do all
the normal sync ops.  Traditionally coherent has meant "completely
disable caching" on arm and you could get away with not doing the sync
ops, but that's not true anymore.  Arm memory subsystems are more
complex these days and sync ops are needed in some cases even for
uncached memory.

In the current ffec code, sync ops on descriptor memory are done on line
655/657 (these could really be back to back without the TDAR write
between them).  At this point new descriptors have been written to the
tx ring and the PREWRITE makes the changes visible to the hardware, the
write into the TDAR register informs the hardware that descriptors were
changed and it should rescan the list.  The POSTWRITE is a no-op but
included for technical correctness.

Another desc-ring related set of sync ops is on lines 683-684.  In this
case the hardware has updated some of the descriptors and the
PREREAD/POSTREAD sync ops ensure that the cpu will see the changes.  The
same is true on lines 850-851.  This is the place where having two calls
is really a performance hit, because both PRE and POSTREAD sync will do
an invalidate, and it really only needs to be done once, but the busdma
conventions are that PRE and POST ops need to come in properly-sequenced
pairs.

And finally on 899-900 is the rx code similar to the tx code on 655/657,
where the driver has updated descriptors in the ring and done a PREWRITE
to make the changes visible to the hardware.

> The buffer that Mr. Kott hit a crash on should also be allocated 
> coherently. Looking at the ARM busdma code in -current I see a possible bug:
> 
> > int
> > _bus_dmamap_load_buffer(bus_dma_tag_t dmat, bus_dmamap_t map, void *buf,
> >     bus_size_t buflen, struct pmap *pmap, int flags, bus_dma_segment_t 
> > *segs,
> >     int *segp)
> > {
> >         bus_size_t sgsize;
> >         bus_addr_t curaddr;
> >         struct sync_list *sl;
> >         vm_offset_t vaddr = (vm_offset_t)buf;
> >         int error = 0;
> >
> >         if (segs == NULL)
> >                 segs = dmat->segments;
> >         if ((flags & BUS_DMA_LOAD_MBUF) != 0)
> >                 map->flags |= DMAMAP_CACHE_ALIGNED;
> >
> >         if ((dmat->flags & BUS_DMA_COULD_BOUNCE) != 0) {
> >                 _bus_dmamap_count_pages(dmat, map, pmap, buf, buflen, 
> > flags);
> >                 if (map->pagesneeded != 0) {
> >                         error = _bus_dmamap_reserve_pages(dmat, map, flags);
> >                         if (error)
> >                                 return (error);
> >                 }
> >         }
> >         CTR3(KTR_BUSDMA, "lowaddr= %d boundary= %d, "
> >             "alignment= %d", dmat->lowaddr, dmat->boundary, 
> > dmat->alignment);
> >
> >         while (buflen > 0) {
> >                 /*
> >                  * Get the physical address for this segment.
> >                  */
> >                 if (__predict_true(pmap == kernel_pmap)) {
> >                         curaddr = pmap_kextract(vaddr);
> >                 } else {
> >                         curaddr = pmap_extract(pmap, vaddr);
> >                         map->flags &= ~DMAMAP_COHERENT;
> >                 }
> 
> If the memory was allocated coherently and we clear the DMAMAP_COHERENT 
> flag when we are loading the memory, the memory will get freed to the 
> wrong bucket when we later free it - right?
> 
> >         if (map->flags & DMAMAP_COHERENT)
> >                 ba = coherent_allocator;
> >         else
> >                 ba = standard_allocator;
> >
> 
> Kott: Can you try to comment out:
> 
> map->flags &= ~DMAMAP_COHERENT;
> 
> In "sys/arm/arm/busdma_machdep.c" and see if it makes any difference?
> 
> Ian: Can you explain what the check "pmap == kernel_pmap" means when it 
> is true? Does it mean that the memory is mapped into virtual kernel 
> space? And if false?
> 

Firstly, that's armv4 code that isn't in play on an armv6/7 platform.

Secondly, remember how I said recently that handling IO directly in/out
of userland buffers wasn't likely to work correctly on arm?  (I haven't
forgotten that you asked a followup question, I just haven't found time
to reply in detail yet.)  What you've spotted above is a little bit of
the old attempts to try to support that on armv4.  It never worked
right.  The bottom line is that pmap will always equal kernel_pmap.  The
else of that code really should be changed into a panic().

> To optimise BUSDMA allocations the USB stack will allocate a PAGE_SIZE 
> object and then load smaller parts of the PAGE_SIZE allocation using 
> BUSDMA. Is this supported by BUSDMA?

If you are allocating the memory using bus_dmamem_alloc() then no,
absolutely not.  It's working for you by accident because of the
USB_HOST_ALIGN hacks which I think date back to freebsd 8 or so.

When I talked to you a couple years ago about doing it the right way,
you were outraged (perhaps rightly so) about the inefficiency involved,
because of things like allocating a 16 byte buffer causing the
allocation of a full page just so that the caching attributes could be
changed.  So I fixed that with the busdma zone allocator stuff that
makes it very efficient to allocate many tiny dma buffers (coherent or
not), and when you use buffers allocated that way, the sync operations
code can make some safe assumptions and avoid doing bounces due to
cacheline alignments.

If buffers were allocated right-sized now that it's efficient to do so,
the USB_HOST_ALIGN hack could go away.   Hmm, actually it might be
necessary to start using the busdma zone allocator in mips as well
before undoing the hack.

-- Ian

_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-usb
To unsubscribe, send any mail to "[email protected]"

Re: usb_pc_cpu_flush

Reply via email to