Re: Depreciated spi_master.transfer and "prepared spi messages" for an optimized pipelined-SPI-DMA-driver

Martin Sperl Fri, 15 Nov 2013 06:53:32 -0800

On 15.11.2013, at 14:33, Mark Brown wrote:
> No, this isn't something which should be being open coded in individual
> drivers - it's far too much redundant work especially once you get into
> doing the tradeoffs for no queue vs queue cases.  Using regmap MMIO may
> be helpful but we need to do that in a way that allows us to integrate
> it into the DMA pipeline...
> 
> This is part of what I keep saying about doing things in a way that
> makes them generally applicable.


Well - we first need to see what is the real "common denominator" of
boards, that can do DMA to configure SPI itself. So we would need at
least 2 or 3 to figure that out (and also all the bugs we may expect to 
see and find work-arrounds for those)

but in principle it should be possible to factor processing out like
this:
* main transfer loop
* while (!transfer_last) {
*   prepareSPIRegisters()
*   prepareAdditionalTXDMA
*   do {
*     append RX/TX transfers
*   } while(do_inner) 
*   append wait_for_clock
*   append delay_us
*   if cs_change
*     append CS_UP
*     append wait_for_clock
* }
* if message->callback
*   append DMAInterrupt_and_position

I believe that might be a generic enough approach that it should work
if one would factor out the "appends" plus the do_inner loop exit 
condition into the master structure.

But my guess is that you would not even allow to get something generic as
this merged unless there are multiple drivers that would be using that
interface.

So we would be in a chicken/egg from this direction as well...

> 
>> Note that if you do that and want to leverage the "prepare_message",
>> then I would recommend you add an additional flag field to these calls.
>> Because then you may pass some hints to the engine (like if it can
>> use GFP_KERNEL or GFP_ATOMIC during allocations). And prepare 
>> could get done with GFP_KERNEL, while a "prepare" during spi_async
>> would need to be done via GFP_ATOMIC.
> 
> Or in the async case we just punt the prepare to the thread.
again: that means an uneccessary delay
especially when you think about running the message_pump without
real-time priority introduces 10us additional delay...

> Yes, this is exactly the sort of thing I'm talking about - it's cutting
> out dead time on the bus.

> Right, and the big concern with the way you're currently proposing to
> implement the optimisation is that it's leaving essentially the whole
> thing up to the individual driver.  This is going to result in limited
> adoption and is going to mean there's no sharing of learning and fixes
> between drivers.

Well, any driver that is using spi_sync can not get improved without a 
rewrite anyway - the context switching becomes prohibitive.

So what some drivers seem to do is create a separate thread, which 
they wake up from their interrupt handler, which then runs the interrupt
handling and uses SPI_sync communication in this thread.
Which means at least 2 context switches one from SPI interrupt (finished)
to wake the message pump, which then will call complete and wake up the 
real thread.

Whatever you do on the framework side you cannot improve the performance
of such drivers without touching the driver itself and converting them
to something else.

It is a whole other story if the driver is already doing async
scheduling of its messages. Then the spi_async thing streamlines the
"next" message to some extent and does not allow the message_pump to
sleep which gives it better response-times.
But you still waste time between the interrupt handler runs and wakes
up the message_pump - especially as the callback is allowed to run from
interrupt context.

As for learning & fixing:
Any driver will go unfixed until someone finds a bug or a performance 
issue and spends time on improving it.

What about writing some "documentation" to show how a driver can get
improved to get higher tru-put? 

And then you can also make the framework do more work for the driver
and thus avoiding replication of code. But then you would again get into 
a need to create a new API for just that covering all those cases.
Maybe by using a code-generator like the samba team did for the smb 
messages.

That is a huge separate project with lots or R&D required, which is 
way outside of the scope of writing a DMA pipelined bus-driver and
proposing things that may improve the performance of optimized drivers.

> To repeat yet again, something that is just a tunnel through to this one
> heavily optimised master driver isn't good.  The problem with the
> proposed implementation is that it is essentially just tunnelling
> through to a heavily optimised master, it's not generic enough.  If
> other devices can make use of it it's more interesting but then those
> uses ought to be being factored out.
Well - as I understand this, it means that:
If there were more bus-drivers that could drive the whole SPI transfer of
a message via DMA, then it would become acceptable to get that API in 
place until then if there is only a single bus and device driver, then it
is not acceptable.

So please define what would be your "critical" mass at which you would 
accept that a API-extension is worth the effort to get integrated?
(Besides metric that shows how much it improves the situation, which also
would need to get defined - we do not want to have performance-regressions
either)

Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-spi" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Depreciated spi_master.transfer and "prepared spi messages" for an optimized pipelined-SPI-DMA-driver

Reply via email to