You are right, I measured the throughput and latency for vncipher and vxor instructions for POWER8 and updated the patch accordingly.
On Thu, Jul 9, 2020 at 5:58 PM Niels Möller <ni...@lysator.liu.se> wrote: > Maamoun TK <maamoun...@googlemail.com> writes: > > > +L16x_round_loop: > > + lxvd2x KX,10,KEYS > > + vperm K,K,K,swap_mask > > + vncipher S0,S0,ZERO > > + vncipher S1,S1,ZERO > > + vncipher S2,S2,ZERO > > + vncipher S3,S3,ZERO > > + vncipher S4,S4,ZERO > > + vncipher S5,S5,ZERO > > + vncipher S6,S6,ZERO > > + vncipher S7,S7,ZERO > > + vncipher S8,S8,ZERO > > + vncipher S9,S9,ZERO > > + vncipher S10,S10,ZERO > > + vncipher S11,S11,ZERO > > + vncipher S12,S12,ZERO > > + vncipher S13,S13,ZERO > > + vncipher S14,S14,ZERO > > + vncipher S15,S15,ZERO > > + vxor S0,S0,K > > + vxor S1,S1,K > > + vxor S2,S2,K > > + vxor S3,S3,K > > + vxor S4,S4,K > > + vxor S5,S5,K > > + vxor S6,S6,K > > + vxor S7,S7,K > > + vxor S8,S8,K > > + vxor S9,S9,K > > + vxor S10,S10,K > > + vxor S11,S11,K > > + vxor S12,S12,K > > + vxor S13,S13,K > > + vxor S14,S14,K > > + vxor S15,S15,K > > + addi 10,10,0x10 > > + bdnz L16x_round_loop > > Do you really need to go all the way to 16 blocks in parallel to > saturate the execution units? I'm used to defining throughput and > latency of an instruction (e.g., vncipher) as follows: > > Throughput: The number of independent vncipher instructions that can be > executed per cycle. Can be measured by benchmarking a loop of > independent instructions. > > Latency: The number of cycles from the start of execution of a vncipher > instruction until execution of an instruction depending on the vncipher > result can start. Can be measured by benchmarking a loop where each > instruction depends on the result of the preceding instruction. > > Do you know throughput and latency of the vncipher and vxor > instructions? (Official manuals are not always to be trusted). Those > numbers determines how much parallelism is needed, typically the product > of latency and throughput. > > Regards, > /Niels > > -- > Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. > Internet email is subject to wholesale government surveillance. > _______________________________________________ nettle-bugs mailing list nettle-bugs@lists.lysator.liu.se http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs