You are right, I measured the throughput and latency for vncipher and vxor
instructions for POWER8 and updated the patch accordingly.

On Thu, Jul 9, 2020 at 5:58 PM Niels Möller <ni...@lysator.liu.se> wrote:

> Maamoun TK <maamoun...@googlemail.com> writes:
>
> > +L16x_round_loop:
> > + lxvd2x KX,10,KEYS
> > + vperm   K,K,K,swap_mask
> > + vncipher S0,S0,ZERO
> > + vncipher S1,S1,ZERO
> > + vncipher S2,S2,ZERO
> > + vncipher S3,S3,ZERO
> > + vncipher S4,S4,ZERO
> > + vncipher S5,S5,ZERO
> > + vncipher S6,S6,ZERO
> > + vncipher S7,S7,ZERO
> > + vncipher S8,S8,ZERO
> > + vncipher S9,S9,ZERO
> > + vncipher S10,S10,ZERO
> > + vncipher S11,S11,ZERO
> > + vncipher S12,S12,ZERO
> > + vncipher S13,S13,ZERO
> > + vncipher S14,S14,ZERO
> > + vncipher S15,S15,ZERO
> > + vxor S0,S0,K
> > + vxor S1,S1,K
> > + vxor S2,S2,K
> > + vxor S3,S3,K
> > + vxor S4,S4,K
> > + vxor S5,S5,K
> > + vxor S6,S6,K
> > + vxor S7,S7,K
> > + vxor S8,S8,K
> > + vxor S9,S9,K
> > + vxor S10,S10,K
> > + vxor S11,S11,K
> > + vxor S12,S12,K
> > + vxor S13,S13,K
> > + vxor S14,S14,K
> > + vxor S15,S15,K
> > + addi 10,10,0x10
> > + bdnz L16x_round_loop
>
> Do you really need to go all the way to 16 blocks in parallel to
> saturate the execution units? I'm used to defining throughput and
> latency of an instruction (e.g., vncipher) as follows:
>
> Throughput: The number of independent vncipher instructions that can be
> executed per cycle. Can be measured by benchmarking a loop of
> independent instructions.
>
> Latency: The number of cycles from the start of execution of a vncipher
> instruction until execution of an instruction depending on the vncipher
> result can start. Can be measured by benchmarking a loop where each
> instruction depends on the result of the preceding instruction.
>
> Do you know throughput and latency of the vncipher and vxor
> instructions? (Official manuals are not always to be trusted). Those
> numbers determines how much parallelism is needed, typically the product
> of latency and throughput.
>
> Regards,
> /Niels
>
> --
> Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
> Internet email is subject to wholesale government surveillance.
>
_______________________________________________
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs

Reply via email to