On 2017/03/28 11:44AM, Michael Ellerman wrote: > "Naveen N. Rao" <naveen.n....@linux.vnet.ibm.com> writes: > > > diff --git a/arch/powerpc/lib/mem_64.S b/arch/powerpc/lib/mem_64.S > > index 85fa9869aec5..ec531de99996 100644 > > --- a/arch/powerpc/lib/mem_64.S > > +++ b/arch/powerpc/lib/mem_64.S > > @@ -13,6 +13,23 @@ > > #include <asm/ppc_asm.h> > > #include <asm/export.h> > > > > +_GLOBAL(__memset16) > > + rlwimi r4,r4,16,0,15 > > + /* fall through */ > > + > > +_GLOBAL(__memset32) > > + rldimi r4,r4,32,0 > > + /* fall through */ > > + > > +_GLOBAL(__memset64) > > + neg r0,r3 > > + andi. r0,r0,7 > > + cmplw cr1,r5,r0 > > + b .Lms > > +EXPORT_SYMBOL(__memset16) > > +EXPORT_SYMBOL(__memset32) > > +EXPORT_SYMBOL(__memset64) > > You'll have to convince me that's better than what GCC produces.
Sure :) I got lazy yesterday night and didn't post the test results... I hadn't tested zram yesterday, but only done tests with a naive test module that memset's a large 1GB buffer with integers. With that test, I saw: without patch: 0.389253910 seconds time elapsed ( +- 1.49% ) with patch: 0.173269267 seconds time elapsed ( +- 1.55% ) .. which is better than 2x. I also tested zram today with the command shared by Wilcox: without patch: 1.493782568 seconds time elapsed ( +- 0.08% ) with patch: 1.408457577 seconds time elapsed ( +- 0.15% ) ... which also shows an improvement along the same lines as x86, as reported by Minchan Kim. - Naveen