> We'd also have to make sure that the comparison is between the linux-omap
> kernel and the OMAPZoo kernel, rather than o-z PIO vs. o-z DMA.  The
> OMAPZoom kernel doesn't post any device register writes.  That should
> cause any driver using PIO to drag, compared to the l-o kernel.

There effetely are a few levels of posting. The patch stops _ARM_ PIO to 
peripheral control registers from hanging out in the interconnect. Other levels 
still are there. The big winner of interconnect from ARM side is DDR operations 
which is still gets posted at all levels.  The size of buffering in 
interconnect is like a couple cache lines so it will fill up pretty fast.  DMA 
and other initiators are not impacted as they don't use arm-mmu attributes.

Really, if you think about it "big" block writes may not be impacted either as 
you will back up at the slower device's speed. Cache is a familiar example. If 
you write out 100M quickly you will get to a point where you are bottlenecked 
on main memory speed very quickly (every write is a miss or cast out at some 
point). The fact you have a cache in the way doesn't matter.

It can make some difference if you're intermixing some small PIOs with other 
work. In general benchmarks I've yet to see the hit on system against all the 
bigger noises. Probably I can construct a case where ~10% is lost.

I recall some tests which have dma + prefetch working on nand.  I'll see if I 
can dig them up.

I've was at several meetings a year back where different memory vendors come in 
and showed with some tweak they can get 2x l-o on flash. Then if you also take 
their device optimized file system you will get like 5x. Hopefully at least the 
in tree 2x is gone now that it's a year later.

Regards,
Richard W.

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to