On Friday 20 June 2008, Paul Mackerras wrote: > Transferring data over loopback is possibly an exception to that. > However, it's very rare to transfer large amounts of data over > loopback, unless you're running a benchmark like iperf or netperf. :-/
Well, it is the exact case that came up in a real world scenario for cell: On a network intensive application where the SPUs are supposed to do all the work, we ended up not getting enough data in and out through gbit ethernet because the PPU spent much of its time in copy_to_user. Going to 10gbit will make the problem even more apparent. I understand that optimizing for this case will cost extra branches for the other cases, but maybe we can find a better compromise than before. Can you name a test case that you consider important to optimize for for what you consider real-life tests? Doing some static compile-time analysis, I found that most of the call sites (which are not necessarily most of the run time calls) pass either a small constant size of less than a few cache lines, or have a variable size but are not at all performance critical. Since the prefetching and cache line size awareness was most of the improvement for cell (AFAIU), maybe we can annotate the few interesting cases, say by introducing a new copy_from_user_large() function that can be easily optimized for large transfers on a given CPU, while the remaining code keeps optmizing for small transfers and may even get rid of the full page copy optimization in order to save a branch. Arnd <>< _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev