Hi Dibakar,

It's not so surprising that dynamic vs. static binaries show some difference in 
instruction fetch behavior, however these numbers do seem drastic, and also 
seem counter intuitive as I would expect a dynamic binary to have more accesses 
due to trampoline calls. Have you looked at the number of branch and i-cache 
misses between the two? Also the number of CPU fetches?

The static binary will certainly be larger if you're compiling in all those 
libs and will occupy more space in memory, but libquantum doesn’t make many 
library calls and it seems you're only running a single process so we won't 
have redundant library code in memory. I could only imagine this happening if 
somehow library code were getting mixed with the application code in the same 
cache line - e.g., because the linker is somehow inlining some of the library 
calls - thus causing more fetches to the i-cache, whereas the dynamically 
linked binary may be serving most of its fetches out of the fetch buffer.

-Tony

-----Original Message-----
From: gem5-dev [mailto:[email protected]] On Behalf Of Dibakar Gope
Sent: Monday, May 30, 2016 9:06 AM
To: [email protected]
Subject: [gem5-dev] Difference in i-cache accesses between dynamic and static 
linked binaries (SE mode)

Hi All,


I was trying with these new dynamic linking patches for SE mode posted by 
Brandon Potter. I am doing some dry run with the libquantum (spec cpu 2006) 
workload. So I have two binaries for libquantum --- one with static linking and 
the other with dynamic linking. However I am finding some huge differences in 
I-cache accesses with running these two binaries. I am running x86 with 
classing memory model. Here are the difference in stats between these two runs:


static libquantum binary (flags used while compiling: -static -m64 -Os 
-mfpmath=sse -msse3):

$ ldd ./libquantum_base.amd64-m64-gcc41-nn_static_link
not a dynamic executable


commandline: ./build/X86/gem5.opt configs/example/se.py --caches --l2cache 
--cmd=./libquantum_base.amd64-m64-gcc41-nn_static_link  -o "15 2" 
--cpu-type=detailed


system.cpu.icache.overall_accesses::cpu.inst      2386350                       
# number of overall (read+write) accesses
system.cpu.commit.committedInsts             14339244                       # 
Number of instructions committed
system.cpu.commit.committedOps               23640619                       # 
Number of ops (including micro ops) committed
system.cpu.ipc                               1.359009                       # 
IPC: Instructions Per Cycle
sim_seconds                                  0.005276                       # 
Number of seconds simulated
system.cpu.fetch.Branches                     3780944                       # 
Number of branches that fetch encountered
system.cpu.fetch.predictedBranches            2249794                       # 
Number of branches that fetch has predicted taken




dynamic libquantum binary (flags used while compiling: -m64 -Os -mfpmath=sse 
-msse3):


$ ldd ./libquantum_base.amd64-m64-gcc41-nn_dyn_link
linux-vdso.so.1 =>  (0x00007ffc62d28000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f18608c0000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f1860500000)
/lib64/ld-linux-x86-64.so.2 (0x00007f1860bc0000)



commandline: ./build/X86/gem5.opt configs/example/se.py --caches --l2cache 
--cmd=./libquantum_base.amd64-m64-gcc41-nn_dyn_link  -o "15 2" 
--cpu-type=detailed

system.cpu.icache.overall_accesses::cpu.inst       840649                       
# number of overall (read+write) accesses
system.cpu.commit.committedInsts             14503123                       # 
Number of instructions committed
system.cpu.commit.committedOps               23942141                       # 
Number of ops (including micro ops) committed
system.cpu.ipc                               1.346305                       # 
IPC: Instructions Per Cycle
sim_seconds                                  0.005386                       # 
Number of seconds simulated
system.cpu.fetch.Branches                     3846406                       # 
Number of branches that fetch encountered
system.cpu.fetch.predictedBranches            2283756                       # 
Number of branches that fetch has predicted taken

So I was wondering why there is such a huge difference between i-cache accesses 
(2386350 with static vs. 840649 with dynamic linking) among these two runs 
although the ipc and total committed instructions are almost same. Any thoughts?

Thanks,
Dibakar Gope
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to