Hi!
> But I hope that you have the necessary infrastructure using the dmaengine
> subsystem for this, or that changes requires will be proposed to that
> first or together with these changes.
>
> As you will be using dmaengine (I guess?) maybe a lot of this can
> actually be handled directly in the core since that code should be
> pretty generic, or in a separate file like spi-dmaengine-chain.c?
I have to admit - I have not been using that infrastructure so far - a bit
uncertain how to make it work yet - I first wanted the prototype working.
Also the kernel on the raspberry Pi (on which I do the development) is not
fully upstream (I know) and also some parts are not really implemented.
I know of people who are only using the upstream kernel on a RPI, but there are
other limitations there with regards to some drivers. Also there would be a bit
of experimentation required to get it in working order - time I do not want to
spend momentarily...
Finally I am also not sure how dmaengine can work when 2 DMAs are required
working in parallel (one for RX the other for TX).
All this means that I have to schedule the RX DMA from the TX-DMA and vice
versa - in addition to configuring the SPI registers via DMA and working
arround those HW-bugs.
If the DMA engine would be able to support all of this I have no Idea yet.
But with all that said, I can do a single SPI_MESSAGE with 5 transfers , 2 of
which have CS_CHANGE only with DMA without any interrupts - besides the final
interrupt to trigger the wakeup of the spi_message_pump thread.
So obviously from there it is not that much more complicated coalescing
multiple spi_messages into a single running DMA thread - it would be just
adding the additional transfers to the DMA chain and making sure the DMA is
still running after the fact that we have added it.
The complication then comes mostly in the form of memory management (especially
releasing DMA Control blocks), locking,... - which one does not have to take
too much care of with the _transfer_one_message interface.
So to put it into perspective:
My main goal is to get an efficient CAN driver for the mcp2515 chip which sits
on the SPI bus of the RPI.
I can right now receive about 3200 messages from the CAN Bus per second (close
to the absolute maximum for messages with 8 bytes in length - I could more than
double the packet count by reducing the packet size to 0 - with this driver
(plus an version of the mcp2515 that uses the async interface and advanced
scheduling of messages to "leverage" concurrency - also self written)
With the "stock" spi-bcm2835 driver that is upstream it uses around 50k
interrupts and a similar amount of context switches and still looses packets.
With the current incarnation of the spi-bcm2835dma driver (using the
transfer_one_message interface) I run at around 16500 interrupts/s and 22000
context switches.
Still what is biting me the most from the transfer perspective is the fact that
there are still too many interrupts and context-switches, which introduces too
much latency-unnecessarily.
So with the spi-bcm2835dma with transfer interface I would estimate that I
would get the interrupts down to 6400 interrupts and 0 context switches. All
this should also have a positive impact on CPU utilization - no longer 80%
System load due to scheduling/dma overhead...
As for the prepare_spi_message - I was asking for something different than
prepare_transfer_hardware (unless there is new code there in 3.12 that already
includes that - or it is in a separate tree).
One would prepare the HW in some way - say waking up a separate thread,...
While what i would like to see would be similar to prepared statements in SQL -
prepare the DMA control blocks once for a message (you may not change the
structure/addresses, but you may change the data you are transferring) and then
when the message gets submitted via spi_async and then via transfer to the
driver, it would then make use of the prepared DMA chain to get attached to the
DMA queue. This would shorten the need to calculate those DMA control blocks
every time - in my case above I run the same computations 3200 times/s -
including dmapool_alloc/dmapool_free/... Obviously it ONLY makes sense for
SPI-transfers that have is_dma_mapped=1 - otherwise I would have to go thru the
loops of dma_map_single_attrs every time....
So this is to fill in the context for my questions regarding why the "transfer"
interface is depreciated and to provide a rational for why I would want to use
it.
Ciao,
Martin
P.s: and for completeness: yes, I can do speed_hz and delay_usecs on a per
spi_transfer basis as well in the DMA loop - probably coming closer to the
"real" requested delay than the steps interrupt->interrupt-handler->wakeup
pump_thread (inside the transfer_one_message handler) -> process the other
arguments of xfer that are applied after the transfer->
udelay(xfer->delay_usec)
On the RPI something like this takes about 0.1ms from one message to the next
to get deliverd - ok, it includes some overhead (like calculating the DMA
control-blocks), but still it shows the order of magnitude which you can expect
when you have to do wait for the message pump to get scheduled (with realtime
priority).
So a delay_usec=100 would already be in the range of what we would see
naturally from the design for a low-power device with a single core - and that
would result in waiting way too long. While with DMA I can get the timing
correct to +-5% of the requested value (jitter due to other memory transfers
that block the memory bus - but this can possibly get )
I will not mention that I believe that with this SPI-DMA driver it opens up the
possibility to read 2 24-ADCs at a rate of 200k with minimal jitter with this
simple hardware at 20MHz SPI-bus speed. (some calibration may be required...)
--
To unsubscribe from this list: send the line "unsubscribe linux-spi" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html