shrinkage

David Brownell Mon, 05 Jan 2009 14:08:23 -0800

On Monday 05 January 2009, Griffis, Brad wrote:
> David,
> 
> I'd like to make a few comments just to be sure that you
> understand a few things related to items you removed.


Not all the hardware capabilities are well packaged by
this code, of course.

I think the only one of these things that count as "removed"
is the (unused and AFAICT incomplete/broken) QDMA support.

Most of the "removed" stuff isn't visible to any driver
using the EDMA interface.  It was just code bloat.


> 1)  EDMA3 resources.  There are several different resources
> that are shared among all users of EDMA3 (i.e. ARM, DSP, etc.).

Shared using the "shadow regions" in some cases.  Does the
DSP software use region 1?  I'd hope it does...

I think that only two types of resource really need to be
in programmers' brains:  the (64) hardware channels managed
through shadow regions; and the (128) PaRAM entries used
to work with them.  The current programming interface is
working a bit too hard to elide the distinction, IMO.


>       A)  Channels - Each channel has a hard-coded event
> associated with it, e.g. McBSP0 RX, etc.

This driver refers to each PaRAM slot as a "logical channel"
(and did so before I touched it).  That's not entirely a
useful simplification.  The term used for what you describe
is "master channel".

And of course, not all of the (64 on DaVinci, 32 on OMAP-L1xx)
master channels have hardware event triggers ... they *can* be
used to start DMA in two other ways though.

(Unrelated:  why do people call ASP modules "McBSP" modules??
Just carrying forward obsolete terminology?  It's confusing,
and needlessly so...)


>       B)  Parameter RAMs - Generally speaking a single channel
> will utilize multiple parameter RAM entries.  One of them is
> the "active" entry and others are utilized as "reload" entries,

Or as termed in this code:  "master" and "slave".  I think
that talking about "reload" would be clearer than "slave",
although the word doesn't peer as cleanly with "master".


IMO it's worth thinking about how to improve the API for
EDMA, but that's not the point of these patches.  Except
in the very weak senses of:

 - Removing unused mechanisms.  When stuff like that is
   removed, it's easier to see the core structure:  what's
   right, what's wrong, what's error-prone, etc.

 - Documentation improvements.  Most of what was there
   before was just call syntax summary; and a lot of that
   was just wrong.  (Missing parameters, specifying return
   codes from void functions, etc.)

A related discussion is whether (and if so, how) to fit
into the <linux/dmaengine.h> framework.  (Currently, I'd
suggest avoiding it:  high overhead, few benefits.)

The OMAP-L1xx chips seem to be DaVinci technology with
an OMAP brand, so I'd also think it's worth thinking
about how to clean up this DaVinci EDMA code so it can
be shared with the OMAP-L1xx support.  An obvious step
is replacing "davinci" with "edma" in the names.


> e.g. you could implement ping pong buffering with zero CPU overhad
> in this way.

Or scatterlist handling, e.g. for MMC.  Or the funkier
flavors of buffer sequences, as with SPI.


> Furthermore, some devices actually allow you to map 
> the "active" entry to any channel you want, i.e. it is programmable.

Presumably you're thinking about QDMA "devices" there.
I don't see that otherwise, on dm355 or dm6446...

 
>       C)  TCCs (Transfer Completion Codes) - The TCCs are
> used in a couple different ways. 

And each master channel is associated with a TCC, one-to-one.

In terms of programming model, developers must not use TCCs
they haven't allocated ... so it's not very useful to think
of them as a different type of resource.

Each PaRAM slot has a field encoding one TCC, and some bits
controlling its uses (irq, chaining).  Easier to think of
TCC usage as a policy associated with PaRAM data than a
resource associated with a channel or anything else.  It's
also worth noting that driver bugs with respect to TCC usage
could wreak much havoc...


>               a. Interrupts - although there are 64 channels
>               and 64 interrupts they are not necessarily
>               a one-to-one mapping.  This is programmable
>               through the TCC field.

Right, although in terms of *error* reporting right now the
interrupt callbacks are always one-to-one.  It's only the
DMA completion events (two flavors) that can be remapped.

An API omission:  TC error reporting, including status about
just what broke.  Not that TC errors are handled at all just
now; but if they were, drivers couldn't do anything useful.


>               b. Chaining - The completion of one channel can
>               initiate a transfer on another channel.  For example,
>               if you were transferring planar YUV video data you
>               could set it up such that the completion of the 'Y'
>               transfer initiates the 'U' transfer which then
>               initiates the 'V' transfer.  The channel you initiate
>               is again specified by the TCC code (same as interrupt).      

Yep.  Nothing in Linux now uses chaining, and I happen to think
it's on the other side of a small boundary:  the one where I
think drivers *should* be working in terms of PaRAM slots and
updating param.opt bitfields directly, instead of using those
fiddly "modify these two PaRAM bit fields" calls which seem
intended to help mask the distinction between hardware channels
and the PaRAM slots driving them.


> 2)  QDMA - the QDMA is intended for memory to memory copies.
> It can be a little quicker to kick off one of these transfers. 

Right.  If we were to have a DMA version of copy_page(), or
some bitblt utilities for framebuffer/video support, I'd expect
to try using QDMA there:  flush tx dcache, start dma, purge rx
dcache, wait for complete, done.


> Note, that I'm not necessarily disagreeing with changes you made
> (though I haven't fully reviewed).  There is a tradeoff here.

When you write "tradeoff" ... oooh, it sounds like engineering!!


> If you consolidate some of these parameters you are
> shrinking/simplifying the code, but you are also potentially
> limiting yourself in what you can do with the EDMA3 peripheral.

That consolidation has already been done -- by whoever in TI
came up with this programming interface several years ago.

To oversimplify, I see two basic audiences for the current
EDMA programming interface.  Both use core calls like

        davinci_request_dma
        davinci_free_dma
        davinci_start_dma
        davinci_stop_dma

Note that those calls encapsulate resource allocation and
management of channel registers in EDMA shadow regions.
There are a few more such "management" calls.

But then the set of calls diverges for PaRAM handling:

 - Developer just needs to get DMA working; and is happy
   with a simplified model, and calls enforcing that model.
   PaRAM calls aimed at these developers:

        davinci_set_dma_src_params
        davinci_set_dma_dest_params
        davinci_set_dma_src_index
        davinci_set_dma_dest_index
        davinci_set_dma_transfer_params
        davinci_dma_link_lch
        davinci_dma_unlink_lch

 - Developer suffers under that simplified model.  They
   need more than it exposes (e.g. chaining), and will
   work directly at the level of PaRAM entries.  Here,
   two PaRAM calls are more than sufficient:

        davinci_set_dma_params
        davinci_get_dma_params

Note that most of the current interfaces mask the EDMA
resources you listed.  Even the *names* don't suggest
that those PaRAM calls are fundamentally different from
the ones managing hardware channels.  Which, to this
experienced interface designer, suggests trouble...


> I'll let you (and the community) make the final decision as
> to what is the right balance, though I wanted to make sure
> you knew why some of those things were in the code to begin with.

Actually, your comments seem to be more on the "why those
things are in the hardware" side of things...

I hope I've reassured you somewhat.  The hardware capabilities
can still be accessed, as well as before.  It's just that the
code doing so has had a *LOT* of needless cruft removed.

- Dave


 
> 
> Brad
> 
> > -----Original Message-----
> > From: [email protected]
> > [mailto:[email protected]] On Behalf
> > Of David Brownell
> > Sent: Monday, January 05, 2009 1:40 AM
> > To: DaVinci
> > Subject: [patch 2.6.28-rc8-davinci1-git+ 0/8] EDMA cleanup/shrinkage
> > 
> > I notice Troy's set of EDMA cleanups came at the beginning
> > of December ... another month, another set of EDMA patches?  :)
> > 
> > 
> > These patches began with some frustration when I was trying
> > to make sense of the current EDMA code.  Basically, there's
> > just **NO** excuse for it to be so complex.
> > 
> > Result:  object code down by over 55%; source by over 30%.

Note that "object" means "code + data".  Most of that 55%
shrinkage is, obviously, data.


> > (Compared to what's in current GIT.)
> > 
> >  - Remove QDMA; it's got a poor and unused interface
> >  - Remove duplicative IRQ bitmap, and related cleanup
> >  - Remove needless "ARM-side" hacks and unused fields
> >  - Remove duplicative "in_use" field; use atomic bitops
> >  - Misc cleanup after those patches
> >  - Messaging tweaks, switch most stuff to kerneldoc
> >  - Remove some unused calls
> >  - Shrink the object code a bit more
> > 
> > I'm thinking this gets it a lot closer to something that
> > could go to mainline...
> > 
> > 
> > These go on top of the DM355 MMC and DMA patches I sent
> > before the holidays.  It's been sanity tested with MMC and
> > audio, the only two drivers in the GIT tree that use DMA.
> > 
> > 
> 

_______________________________________________
Davinci-linux-open-source mailing list
[email protected]
http://linux.davincidsp.com/mailman/listinfo/davinci-linux-open-source

Re: [patch 2.6.28-rc8-davinci1-git+ 0/8] EDMA cleanup/shrinkage

Reply via email to