On Fri, Nov 01, 2013 at 01:26:52PM -0700, Joe Perches wrote: > On Fri, 2013-11-01 at 15:58 -0400, Neil Horman wrote: > > On Fri, Nov 01, 2013 at 12:45:29PM -0700, Joe Perches wrote: > > > On Fri, 2013-11-01 at 13:37 -0400, Neil Horman wrote: > > > > > > > I think it would be better if we just did the prefetch here > > > > and re-addressed this area when AVX (or addcx/addox) instructions were > > > > available > > > > for testing on hardware. > > > > > > Could there be a difference if only a single software > > > prefetch was done at the beginning of transfer before > > > the while loop and hardware prefetches did the rest? > > > > > I wouldn't think so. If hardware was going to do any prefetching based on > > memory access patterns it will do so regardless of the leading prefetch, and > > that first prefetch isn't helpful because we still wind up stalling on the > > adds > > while its completing > > I imagine one benefit to be helping prevent > prefetching beyond the actual data required. > > Maybe some hardware optimizes prefetch stride > better than 5*64. > > I wonder also if using > > if (count > some_length) > prefetch > while (...) > > helps small lengths more than the test/jump cost. > We've already done this and it is in fact the best performing. I'll be posting that patch along with ingos request to add do_csum to the perf bench code when I have that done Best Neil
> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/