On Wed, Jun 02, 2021 at 07:27:35PM -0500, Segher Boessenkool wrote:
> On Wed, Jun 02, 2021 at 05:13:15PM -0500, Paul A. Clarke wrote:
> > Add a naive implementation of the subject x86 intrinsic to
> > ease porting.
> 
> > +/* Return horizontal packed word minimum and its index in bits [15:0]
> > +   and bits [18:16] respectively.  */
> > +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
> > __artificial__))
> > +_mm_minpos_epu16 (__m128i __A)
> > +{
> > +  union __u
> > +    {
> > +      __m128i __m;
> > +      __v8hu __uh;
> > +    };
> > +  union __u __u = { .__m = __A }, __r = { .__m = {0} };
> > +  unsigned short __ridx = 0;
> > +  unsigned short __rmin = __u.__uh[__ridx];
> > +  for (unsigned long __i = __ridx+1;
> 
> (spaces around the "+"?)

ok

> 
> > +       __i < sizeof (__u.__uh) / sizeof (__u.__uh[0]);
> 
> You should either use a macro for that, or just write "8" :-)

ok. (There should be a standard thing for this operation.)

> > +       __i++)
> > +    {
> > +      if (__u.__uh[__i] < __rmin)
> > +        {
> > +          __rmin = __u.__uh[__i];
> > +          __ridx = __i;
> > +        }
> > +    }
> > +  __r.__uh[0] = __rmin;
> > +  __r.__uh[1] = __ridx;
> > +  return __r.__m;
> > +}
> 
> This does not compute the index correctly for big endian (it needs to
> walk from right to left for that).  The construction of the return value
> looks wrong as well.
> 
> Okay for trunk with that fixed.  Thanks!

I'm not seeing the issue here. The values are numbered by element order,
and the results are in the "first" (minimum value) and "second" (index of
first encountered minimum value in element order) elements of the result.

PC

Reply via email to