答复: Re: Is fcommon related with performance optimization logic?

Zhaohaifeng(Clark,CIS-HCE) via Gcc Fri, 31 May 2024 01:42:44 -0700

Thanks.

the UnixBench source code is as following:


unsigned long Run_Index;
Rec_Pointer Ptr_Glob,
                Next_Ptr_Glob;
int Int_Glob;
Boolean Bool_Glob;
char Ch_1_Glob,
                Ch_2_Glob;
int Arr_1_Glob [50];
int Arr_2_Glob [50] [50];
        Boolean Reg = true;
long Begin_Time,
                End_Time,
                User_Time;
float Microseconds,
                Dhrystones_Per_Second;

Some key results are as following :

1.       Using gcc 10.3 the variables are arranged from the last 
Dhrystone_Per_Second to the first Ptr_Glob, both in assembly and the final 
binary.
0x00000000004040c0   0x0000000000000008   B      stderr@GLIBC_2.2.5
0x00000000004040c8   0x0000000000000001   b      completed.0
0x00000000004040e0   0x0000000000000004   B      Dhrystones_Per_Second
0x00000000004040e4   0x0000000000000004   B      Microseconds
0x00000000004040e8   0x0000000000000008   B      User_Time
0x00000000004040f0   0x0000000000000008   B      End_Time
0x00000000004040f8   0x0000000000000008   B      Begin_Time
0x0000000000404100   0x0000000000000004   B      Reg
0x0000000000404120   0x0000000000002710   B      Arr_2_Glob
0x0000000000406840   0x00000000000000c8   B      Arr_1_Glob
0x0000000000406908   0x0000000000000001   B      Ch_2_Glob
0x0000000000406909   0x0000000000000001   B      Ch_1_Glob
0x000000000040690c   0x0000000000000004   B      Bool_Glob
0x0000000000406910   0x0000000000000004   B      Int_Glob
0x0000000000406918   0x0000000000000008   B      Next_Ptr_Glob
0x0000000000406920   0x0000000000000008   B      Ptr_Glob
0x0000000000406928   0x0000000000000008   B      Run_Index

If we change the sequence of the variables in the source code, the sequence in 
assembly and binary is also changed as the same logic, using gcc 10.3.


2.       Using gcc 8.5 the variables are arranged as following both in assembly 
and final binary,
0x00000000004040c0   0x0000000000000008   B      stderr@GLIBC_2.2.5
0x00000000004040c8   0x0000000000000001   b      completed.0
0x00000000004040e0   0x0000000000000008   B      Begin_Time
0x0000000000404100   0x0000000000002710   B      Arr_2_Glob
0x0000000000406810   0x0000000000000001   B      Ch_2_Glob
0x0000000000406818   0x0000000000000008   B      Run_Index
0x0000000000406820   0x0000000000000004   B      Microseconds
0x0000000000406828   0x0000000000000008   B      Ptr_Glob
0x0000000000406830   0x0000000000000004   B      Dhrystones_Per_Second
0x0000000000406838   0x0000000000000008   B      End_Time
0x0000000000406840   0x0000000000000004   B      Int_Glob
0x0000000000406844   0x0000000000000004   B      Bool_Glob
0x0000000000406848   0x0000000000000008   B      User_Time
0x0000000000406850   0x0000000000000008   B      Next_Ptr_Glob
0x0000000000406860   0x00000000000000c8   B      Arr_1_Glob
0x0000000000406928   0x0000000000000001   B      Ch_1_Glob

If the variable sequence is changed in the source code, the sequence in 
assembly and binary is NOT changed using gcc 8.5.
So we can see that the assembling process take effect and fcommon will arrange 
the variables following some special logic.


3.       If we make some change to the source code, by adding some int arrays 
between the variables, the performance of using gcc 10.3 is similar as gcc 8.5. 
So it can be infered that variable caching process is changed in this case 
which has great impact in this problem.

So it is the problem that whether the fcommon has some expected performance 
optimization logic. If not, maybe it is just some random performance result. 
But the variable arrangement reveals that it has some special logic.

Best regards,
Clark Zhao

This e-mail and its attachments contain confidential information from HUAWEI, 
which is intended only for the person or entity whose address is listed above. 
Any use of the information contained herein in any way (including, but not 
limited to, total or partial disclosure, reproduction, or dissemination) by 
persons other than the intended recipient(s) is prohibited. If you receive this 
e-mail in error, please notify the sender by phone or email immediately and 
delete it!

发件人: 赵海峰 [mailto:zju....@qq.com]
发送时间: 2024年5月31日 16:27
收件人: Zhaohaifeng(Clark,CIS-HCE) <zhaohaife...@huawei.com>
主题: Fw: Re: Is fcommon related with performance optimization logic?



---Original---
From: "Andrew Pinski"<pins...@gmail.com<mailto:pins...@gmail.com>>
Date: Thu, May 30, 2024 10:27 AM
To: "赵海峰"<zju....@qq.com<mailto:zju....@qq.com>>;
Cc: "gcc"<gcc@gcc.gnu.org<mailto:gcc@gcc.gnu.org>>;
Subject: Re: Is fcommon related with performance optimization logic?

On Wed, May 29, 2024 at 7:13 PM 赵海峰 via Gcc wrote:
>
> Dear Sir/Madam,
>
>
> We found that running on intel SPR UnixBench compiled with gcc 10.3 performs 
> worse than with gcc 8.5 for dhry2reg benchmark.
>
>
> I found it related with -fcommon option which is disabled in 10.3 by default. 
> Fcommon will make global variables addresses in special order in bss 
> section(watching by nm -n) whatever they are defined in source code.
>
>
> We are wondering if fcommon has some special performance optimization process?
>
>
> (I also post the subject to gcc-help. Hope to get some suggestion in this 
> mail list. Sorry for bothering.)

This was already filed as
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114532 . But someone
needs to go in and do more analysis of what is going wrong. The
biggest difference for x86_64 is how the variables are laid out and by
who (the compiler or the linker). There is some notion that
-fno-common increases the number of L1-dcache-load-misses and that
points to the layout of the variable differences causing the
difference. But nobody has gone and seen which variables are laid out
differently and why. I am suspecting that small changes in the
code/variables would cause layout differences which will cause the
cache misses which can cause the performance which is almost all by
accident.
I suspect adding -fdata-sections will cause another performance
difference here too. And there is not much GCC can do about this since
data layout is "hard" to do to get the best performance always.

Thanks,
Andrew Pinski

>
>
> Best regards.
>
>
> Clark Zhao

答复: Re: Is fcommon related with performance optimization logic?

Reply via email to