Hi Cyril, On Sat, Sep 23, 2017 at 12:06:48AM +1000, Cyril Bur wrote: > On Thu, 2017-09-21 at 07:34 +0800, wei.guo.si...@gmail.com wrote: > > From: Simon Guo <wei.guo.si...@gmail.com> > > > > This patch add VMX primitives to do memcmp() in case the compare size > > exceeds 4K bytes. > > > > Hi Simon, > > Sorry I didn't see this sooner, I've actually been working on a kernel > version of glibc commit dec4a7105e (powerpc: Improve memcmp performance > for POWER8) unfortunately I've been distracted and it still isn't done. Thanks for sync with me. Let's consolidate our effort together :)
I have a quick check on glibc commit dec4a7105e. Looks the aligned case comparison with VSX is launched without rN size limitation, which means it will have a VSX reg load penalty even when the length is 9 bytes. It did some optimization when src/dest addrs don't have the same offset on 8 bytes alignment boundary. I need to read more closely. > I wonder if we can consolidate our efforts here. One thing I did come > across in my testing is that for memcmp() that will fail early (I > haven't narrowed down the the optimal number yet) the cost of enabling > VMX actually turns out to be a performance regression, as such I've > added a small check of the first 64 bytes to the start before enabling > VMX to ensure the penalty is worth taking. Will there still be a penalty if the 65th byte differs? > > Also, you should consider doing 4K and greater, KSM (Kernel Samepage > Merging) uses PAGE_SIZE which can be as small as 4K. Currently the VMX will only be applied when size exceeds 4K. Are you suggesting a bigger threshold than 4K? We can sync more offline for v3. Thanks, - Simon