Hi,

Per request, I collected runtime performance data and code size data with 
CPU2017 on a X86 platform. 

*** Machine info:
model name>-----: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
$ lscpu | grep NUMA
NUMA node(s):          2
NUMA node0 CPU(s):     0-21,44-65
NUMA node1 CPU(s):     22-43,66-87

***CPU2017 benchmarks: 
all the benchmarks with C/C++, 9 Integer benchmarks, 10 FP benchmarks. 

***Configures:
Intrate and fprate, 22 copies. 

***Compiler options:
no :                            -g -O2 -march=native
used_gpr_arg:   no + -fzero-call-used-regs=used-gpr-arg
used_arg:               no + -fzero-call-used-regs=used-arg
all_arg:                        no + -fzero-call-used-regs=all-arg
used_gpr:               no + -fzero-call-used-regs=used-gpr
all_gpr:                        no + -fzero-call-used-regs=all-gpr
used:                   no + -fzero-call-used-regs=used
all:                            no + -fzero-call-used-regs=all

***each benchmark runs 3 times. 

***runtime performance data:
Please see the attached csv file


From the data, we can see that:
On average, all the options starting with “used_…”  (i.e, only the registers 
that are used in the routine will be zeroed) have very low runtime overheads, 
at most 1.72% for integer benchmarks, and 1.17% for FP benchmarks. 
If all the registers will be zeroed, the runtime overhead is bigger, all_arg is 
5.7%, all_gpr is 3.5%, and all is 17.56% for integer benchmarks on average. 
Looks like the overhead of zeroing vector registers is much bigger. 

For ROP mitigation, -fzero-call-used-regs=used-gpr-arg should be enough, the 
runtime overhead with this is very small.

***code size increase data:

Please see the attached file 


From the data, we can see that:
The code size impact in general is very small, the biggest is “all_arg”, which 
is 1.06% for integer benchmark, and 1.13% for FP benchmarks.

So, from the data collected, I think that the run-time overhead and code size 
increase from this option are very reasonable. 

Let me know you comments and opinions.

thanks.

Qing

> On Aug 25, 2020, at 4:54 PM, Qing Zhao via Gcc-patches 
> <gcc-patches@gcc.gnu.org> wrote:
> 
> 
> 
>> On Aug 24, 2020, at 3:20 PM, Segher Boessenkool <seg...@kernel.crashing.org> 
>> wrote:
>> 
>> Hi!
>> 
>> On Mon, Aug 24, 2020 at 01:02:03PM -0500, Qing Zhao wrote:
>>>> On Aug 24, 2020, at 12:49 PM, Segher Boessenkool 
>>>> <seg...@kernel.crashing.org> wrote:
>>>> On Wed, Aug 19, 2020 at 06:27:45PM -0500, Qing Zhao wrote:
>>>>>> On Aug 19, 2020, at 5:57 PM, Segher Boessenkool 
>>>>>> <seg...@kernel.crashing.org> wrote:
>>>>>> Numbers on how expensive this is (for what arch, in code size and in
>>>>>> execution time) would be useful.  If it is so expensive that no one will
>>>>>> use it, it helps security at most none at all :-(
>>>> 
>>>> Without numbers on this, no one can determine if it is a good tradeoff
>>>> for them.  And we (the GCC people) cannot know if it will be useful for
>>>> enough users that it will be worth the effort for us.  Which is why I
>>>> keep hammering on this point.
>>> I can collect some run-time overhead data on this, do you have a 
>>> recommendation on what test suite I can use
>>> For this testing? (Is CPU2017 good enough)?
>> 
>> I would use something more real-life, not 12 small pieces of code.
> 
> There is some basic information about the benchmarks of CPU2017 in below link:
> 
> https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$
>  
> <https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$>
>  
> <https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$
>  
> <https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$>
>  >
> 
> GCC itself is one of the benchmarks in CPU2017 (502.gcc_r). And 526.blender_r 
> is even larger than 502.gcc_r. 
> And there are several other quite big benchmarks as well (perlbench, 
> xalancbmk, parest, imagick, etc).
> 
> thanks.
> 
> Qing

Reply via email to