Amitay Isaacs <[email protected]> writes:

> +     .file "ecc-secp192r1-modp.asm"

Thanks, I'm looking at this file first (being the simplest, even though
the security level of this curve is a bit low for current usage, so
performance not of so great importance).

I'm quite new to powerpc, so I'm refering to the instruction reference,
and trying to learn as we go along. It seems addc is addition with carry
output (but no carry input), adde is addition with carry input and
output, and addze is addition of zero with carry input and output.

> +define(`RP', `r4')
> +define(`XP', `r5')
> +
> +define(`T0', `r6')
> +define(`T1', `r7')
> +define(`T2', `r8')
> +define(`T3', `r9')
> +define(`C1', `r10')
> +define(`C2', `r11')

As I understand it, we could also use register r3 (unused input
argument), but we don't need to, since we have enough free scratch
registers.

> +     C void ecc_secp192r1_modp (const struct ecc_modulo *m, mp_limb_t *rp)
> +     .text
> +define(`FUNC_ALIGN', `5')
> +PROLOGUE(_nettle_ecc_secp192r1_modp)
> +     ld      T0, 0(XP)
> +     ld      T1, 8(XP)
> +     ld      T2, 16(XP)
> +
> +     li      C1, 0
> +     li      C2, 0
> +
> +     ld      T3, 24(XP)
> +     addc    T0, T3, T0
> +     adde    T1, T3, T1
> +     addze   T2, T2
> +     addze   C1, C1
> +
> +     ld      T3, 32(XP)
> +     addc    T1, T3, T1
> +     adde    T2, T3, T2
> +     addze   C1, C1
> +
> +     ld      T3, 40(XP)
> +     addc    T0, T3, T0
> +     adde    T1, T3, T1
> +     adde    T2, T3, T2
> +     addze   C1, C1

To analyze what we are doing, I'm using the Nettle and GMP convention
that B = 2^64 (bignum base), then p = B^3 - B - 1, or B^3 = B + 1 (mod
p). Denote the six input words as

  <a_5,a_4,a_3,a_2,a_1,a_0>

representing the number 

  B^5 a_5 + B^4 a_4 + B^3 a_3 + B^2 a_2 + B a_1 + a_0

The accumulation above, as I understand it, computes

  <c_1,t_2,t_1,t_0> = <a_2,a_1,a_0> + a_3 (B+1) + a_4 (B^2 + B) 
                    + a_5 (B^2 + B + 1>

or more graphically,

      a_2 a_2 a_1
          a_3 a_3
      a_4 a_4
    + a_5 a_5 a_5
  ---------------
  c_1 t_2 t_1 t_0

This number is < 3 B^3, which means that c_1 is 0, 1 or 2 (each of the
addze instructions can increment it).

This looks nice, and I think it is pretty efficient too. It looks a bit
different from what the x86_64 code is doing; maybe the latter could be
improved.

> +     addc    T0, C1, T0
> +     adde    T1, C1, T1
> +     addze   T2, T2
> +     addze   C2, C2

Above, c_1 is folded in at the right places, 

  <c_2,t_2,t_1,t_0>  <--  <t_2, t_1, t_0> + c_1 (B + 1)

This number is < B^3 + 3 (B+1). This implies that in the (quite
unlikely) case we get carry out, i.e., c_2 = 1, then the value of the
low three words is < 3 (B+1). That means that there can be no new carry
out when folding c_2.

> +     li      C1, 0
> +     addc    T0, C2, T0
> +     adde    T1, C2, T1
> +     addze   T2, T2
> +     addze   C1, C1
> +
> +     addc    T0, C1, T0
> +     adde    T1, C1, T1
> +     addze   T2, T2

So I think this final folding could be reduced to just

        addc    T0, C2, T0
        adde    T1, C2, T1
        addze   T2, T2

There's no carry out, from this, because either C2 was zero, or T2 was
small, <= 3. Does that make sense?

> +     std     T0, 0(RP)
> +     std     T1, 8(RP)
> +     std     T2, 16(RP)
> +
> +     blr
> +EPILOGUE(_nettle_ecc_secp192r1_modp)

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
_______________________________________________
nettle-bugs mailing list
[email protected]
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs

Reply via email to