Krishna Yenduri wrote:
> Hi Vladimir,
> 
>> Any advice on what the micro benchmark should test ?
> 
> I will send them to you separately.
> 
> There shouldn't be any performance impact with software providers only case
>  after your latest change. So, the interesting part is how much is the
> impact when n2cp is present. I expect this to be insignificant. Best to 
> confirm it though.

The results of the AES microbenchmark runs (5 repetitions of each test) 
on T5220 machine (64 HW threads, each 1167MHz) are rather interesting:

(AES 64)        avg before      avg after       change in percent
1               4629600         4705600         1.64
2               9099000         9234400         1.49
4               18148000        18492200        1.9
8               32051400        33205200        3.6
16              48161200        51696400        7.34
32              82082400        78998600        -3.76
64              132646000       130971000       -1.26
average                                         1.56

(AES 512)       avg before      avg after       change in percent
1               47475200        47350600        -0.26
2               93164000        92904800        -0.28
4               185580000       184444800       -0.61
8               371351600       369013400       -0.63
16              421443600       414700400       -1.6
32              527972800       523183800       -0.91
64              615777400       613520400       -0.37
average                                         -0.67

The test environments (before, after) were constructed using bfu from 
non-DEBUG bits compiled from 2 repositories, where the only difference 
was the changeset with my changes.

In sum, the impact is around 1-2 percent.

However, there are 2 interesting anomalies:
   - the AES encrypting 64 bytes of data with CKM_AES_CBC case actually 
reports small performance improvement in average (mainly because of 
improvement with 16 threads)
     - I don't know how to interpret this
   - the AES encrypting 512 bytes of data with CKM_AES_CBC case has an 
interesting quirk with 16 threads
     - possibly related to the 8 MAU processing units in UltraSPARC-T2

Also, when trying to measure the on-CPU overhead of 
kcf_check_prov_mech_keylen() via dtrace (while also gathering 
elapsed+on-CPU times of fbt:kcf:kcf_get_*_provider:) the dtrace overhead 
made the test produce cca 5 times smaller numbers but this might be a 
bug in my script.


v.

Reply via email to