On Thu, 19 Nov 2015, Måns Rullgård wrote:

> Nicolas Pitre <[email protected]> writes:
> 
> > +static inline uint64_t __arch_xprod_64(uint64_t m, uint64_t n, bool bias)
> > +{
> > +   unsigned long long res;
> > +   unsigned int tmp = 0;
> > +
> > +   if (!bias) {
> > +           asm (   "umull  %Q0, %R0, %Q1, %Q2\n\t"
> > +                   "mov    %Q0, #0"
> > +                   : "=&r" (res)
> > +                   : "r" (m), "r" (n)
> > +                   : "cc");
> > +   } else if (!(m & ((1ULL << 63) | (1ULL << 31)))) {
> > +           res = m;
> > +           asm (   "umlal  %Q0, %R0, %Q1, %Q2\n\t"
> > +                   "mov    %Q0, #0"
> > +                   : "+&r" (res)
> > +                   : "r" (m), "r" (n)
> > +                   : "cc");
> > +   } else {
> > +           asm (   "umull  %Q0, %R0, %Q2, %Q3\n\t"
> > +                   "cmn    %Q0, %Q2\n\t"
> > +                   "adcs   %R0, %R0, %R2\n\t"
> > +                   "adc    %Q0, %1, #0"
> > +                   : "=&r" (res), "+&r" (tmp)
> > +                   : "r" (m), "r" (n)
> 
> Why is tmp using a +r constraint here?  The register is not written, so
> using an input-only operand could/should result in better code.  That is
> also what the old code did.

No, it is worse. gcc allocates two registers because, somehow, it 
doesn't think that the first one still holds zero after the first usage.  
This way usage of only one temporary register is forced throughout, 
producing better code.

I meant to have this split out in a separate patch but messed it up 
somehow.



> 
> > +                   : "cc");
> > +   }
> > +
> > +   if (!(m & ((1ULL << 63) | (1ULL << 31)))) {
> > +           asm (   "umlal  %R0, %Q0, %R1, %Q2\n\t"
> > +                   "umlal  %R0, %Q0, %Q1, %R2\n\t"
> > +                   "mov    %R0, #0\n\t"
> > +                   "umlal  %Q0, %R0, %R1, %R2"
> > +                   : "+&r" (res)
> > +                   : "r" (m), "r" (n)
> > +                   : "cc");
> > +   } else {
> > +           asm (   "umlal  %R0, %Q0, %R2, %Q3\n\t"
> > +                   "umlal  %R0, %1, %Q2, %R3\n\t"
> > +                   "mov    %R0, #0\n\t"
> > +                   "adds   %Q0, %1, %Q0\n\t"
> > +                   "adc    %R0, %R0, #0\n\t"
> > +                   "umlal  %Q0, %R0, %R2, %R3"
> > +                   : "+&r" (res), "+&r" (tmp)
> > +                   : "r" (m), "r" (n)
> > +                   : "cc");
> > +   }
> > +
> > +   return res;
> > +}
> 
> -- 
> Måns Rullgård
> [email protected]
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 

Reply via email to