Hi!

> But I hope that you have the necessary infrastructure using the dmaengine
> subsystem for this, or that changes requires will be proposed to that
> first or together with these changes.
> 
> As you will be using dmaengine (I guess?) maybe a lot of this can
> actually be handled directly in the core since that code should be
> pretty generic, or in a separate file like spi-dmaengine-chain.c?

I have to admit - I have not been using that infrastructure so far - a bit 
uncertain how to make it work yet - I first wanted the prototype working.

Also the kernel on the raspberry Pi (on which I do the development) is not 
fully upstream (I know) and also some parts are not really implemented.
I know of people who are only using the upstream kernel on a RPI, but there are 
other limitations there with regards to some drivers. Also there would be a bit 
of experimentation required to get it in working order - time I do not want to 
spend momentarily...

Finally I am also not sure how dmaengine can work when 2 DMAs are required 
working in parallel (one for RX the other for TX).
All this means that I have to schedule the RX DMA from the TX-DMA and vice 
versa - in addition to configuring the SPI registers via DMA and working 
arround those HW-bugs.

If the DMA engine would be able to support all of this I have no Idea yet.

But with all that said, I can do a single SPI_MESSAGE with 5 transfers ,  2 of 
which have CS_CHANGE only with DMA without any interrupts - besides the final 
interrupt to trigger the wakeup of the spi_message_pump thread.

So obviously from there it is not that much more complicated coalescing 
multiple spi_messages into a single running DMA thread  - it would be just 
adding the additional transfers to the DMA chain and making sure the DMA is 
still running after the fact that we have added it.

The complication then comes mostly in the form of memory management (especially 
releasing DMA Control blocks), locking,... - which one does not have to take 
too much care of with the _transfer_one_message interface.

So to put it into perspective:

My main goal is to get an efficient CAN driver for the mcp2515 chip which sits 
on the SPI bus of the RPI.
I can right now receive about 3200 messages from the CAN Bus per second (close 
to the absolute maximum for messages with 8 bytes in length - I could more than 
double the packet count by reducing the packet size to 0 -   with this driver 
(plus an version of the mcp2515 that uses the async interface and advanced 
scheduling of messages to "leverage" concurrency - also self written)

With the "stock" spi-bcm2835 driver that is upstream it uses around 50k 
interrupts and a similar amount of context switches  and still looses packets.
With the current incarnation of the spi-bcm2835dma driver (using the 
transfer_one_message interface) I run at around 16500 interrupts/s and 22000 
context switches.
Still what is biting me the most from the transfer perspective is the fact that 
there are still too many interrupts and context-switches, which introduces too 
much latency-unnecessarily.

So with the spi-bcm2835dma with transfer interface I would estimate that I 
would get the interrupts down to 6400 interrupts and 0 context switches. All 
this should also have a positive impact on CPU utilization - no longer 80% 
System load due to scheduling/dma overhead...

As for the prepare_spi_message - I was asking for something different than 
prepare_transfer_hardware (unless there is new code there in 3.12 that already 
includes that - or it is in a separate tree).
One would prepare the HW in some way - say waking up a separate thread,...
While what i would like to see would be similar to prepared statements in SQL - 
prepare the DMA control blocks once for a message (you may not change the 
structure/addresses, but you may change the data you are transferring) and then 
when the message gets submitted via spi_async and then via transfer to the 
driver, it would then make use of the prepared DMA chain to get attached to the 
DMA queue. This would shorten the need to calculate those DMA control blocks 
every time - in my case above I run the same computations 3200 times/s - 
including dmapool_alloc/dmapool_free/... Obviously it ONLY makes sense for 
SPI-transfers that have is_dma_mapped=1 - otherwise I would have to go thru the 
loops of dma_map_single_attrs  every time....

So this is to fill in the context for my questions regarding why the "transfer" 
interface is depreciated and to provide a rational for why I would want to use 
it.

Ciao,
                Martin

P.s: and for completeness: yes, I can do speed_hz and delay_usecs on a per 
spi_transfer basis as well in the DMA loop - probably coming closer to the 
"real" requested delay than the steps interrupt->interrupt-handler->wakeup 
pump_thread (inside the transfer_one_message handler) -> process the other 
arguments of xfer that are applied after the transfer-> 
udelay(xfer->delay_usec) 
On the RPI something like this takes about 0.1ms from one message to the next 
to get deliverd - ok, it includes some overhead (like calculating the DMA 
control-blocks), but still it shows the order of magnitude which you can expect 
when you have to do wait for the message pump to get scheduled (with realtime 
priority).
So a delay_usec=100 would already be in the range of what we would see 
naturally from the design for a low-power device with a single core - and that 
would result in waiting way too long. While with DMA I can get the timing 
correct to +-5% of the requested value (jitter due to other memory transfers 
that block the memory bus - but this can possibly get )

I will not mention that I believe that with this SPI-DMA driver it opens up the 
possibility to read 2 24-ADCs at a rate of 200k with minimal jitter with this 
simple hardware at 20MHz SPI-bus speed. (some calibration may be required...)


--
To unsubscribe from this list: send the line "unsubscribe linux-spi" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to