My setup is Linux PPC kernel 2.4.30 on an embedded PPC405GPr. The board has some image processing devices including compressors. I'm working with high image rates so performance is an issue.
The drivers for the pci based compressor chips support readv and use map_user_kiobuf and pci_map_single to map the output buffers for the read. (The devices do scatter DMA.) This is too slow, though. More time is spent mapping then compressing! I did some measurements, at it seems that the vast amount of the time is spent in pci_map_single, which calls only the consistent_sync function, which for FROMDEVICE calls only invalidate_dcache_range. So I'm convinced that invalidating the cache for the output buffer (which is large, in case the image that arrives is large) is taking most of the time. So I want to eliminate it. And the way I want to do that is to have a heap of memory in the user-mode process mapped uncached. The hope is that I can pass that through the readv to the driver, which sets up the DMA. Then I can skip the pci_map_single (and the thus the invalidate_dcache_range) thus saving lots of time. Plan-B would be to have a driver allocate the heap of memory, but I really need the mapping into user mode to be uncached, as the processor does some final touch up (header et al) before sending it to the next device. -- Steve Williams "The woods are lovely, dark and deep. steve at icarus.com But I have promises to keep, http://www.icarus.com and lines to code before I sleep, http://www.picturel.com And lines to code before I sleep."