Hi all,
I am running Parsec on X86 with 2 cores, 2 threads. I am using the
fastforwarding option up to ROI, then switch to detailed CPU. I also
dumpreset the statistics periodically.
I tried the private L2 cache configuration, modified my CacheConfig.py file
following the approach from previous threads. It worked for the first part
of the simulation where i have the SimpleAtomicCPU (I observe in the stats
file and the config file l20 and l21 entries). However, when it switches to
O3 cpu, the number of L2 statistics dumped get very few and there are no
stats like system.l20.misses_total, there are only hits. It also seems like
the hits are equal to the #of accesses exactly for all dumped stats. Do you
have any idea why that could be the case? When it switches CPUs, does it
get the same cache configuration (as in the CacheConfig.py file) or should
i do sth extra? This problem is probably independent of the private L2
configuration because i observed the same issue when i had 1 shared L2 (the
default CacheConfig.py).

   - The l2 stats for the simpleAtomicCPU (before switching) look like this
   (only showing the overall_misses portion):

...
system.l20.overall_misses::cpu0.dtb.walker
123                       # number of overall misses
system.l20.overall_misses::cpu0.itb.walker
78                       # number of overall misses
system.l20.overall_misses::cpu0.inst            29508
# number of overall misses
system.l20.overall_misses::cpu0.data           636699
# number of overall misses
system.l20.overall_misses::total               666408
# number of overall misses
...
system.l21.overall_misses::cpu1.dtb.walker
73                       # number of overall misses
system.l21.overall_misses::cpu1.itb.walker
64                       # number of overall misses
system.l21.overall_misses::cpu1.inst             7609
# number of overall misses
system.l21.overall_misses::cpu1.data           151664
# number of overall misses
system.l21.overall_misses::total               159410
# number of overall misses
...



   - After switching the cpu, the l2 stats look as given below. L2 stats
   get shorter and overall_misses are not reported. It is also weird that the
   overall_accesses_total and the overall_hits_total are exactly the same for
   both caches.


system.l21.overall_hits::total                   1329
# number of overall hits
system.l21.ReadReq_accesses::switch_cpus1.dtb.walker
21                       # number of ReadReq accesses(hits+misses)
system.l21.ReadReq_accesses::switch_cpus1.data
1150                       # number of ReadReq accesses(hits+misses)
system.l21.ReadReq_accesses::total               1171
# number of ReadReq accesses(hits+misses)
system.l21.Writeback_accesses::writebacks
595                       # number of Writeback accesses(hits+misses)
system.l21.Writeback_accesses::total              595
# number of Writeback accesses(hits+misses)
system.l21.ReadExReq_accesses::switch_cpus1.data
158                       # number of ReadExReq accesses(hits+misses)
system.l21.ReadExReq_accesses::total              158
# number of ReadExReq accesses(hits+misses)
system.l21.demand_accesses::switch_cpus1.dtb.walker
21                       # number of demand (read+write) accesses
system.l21.demand_accesses::switch_cpus1.data
1308                       # number of demand (read+write) accesses
system.l21.demand_accesses::total                1329
# number of demand (read+write) accesses
system.l21.overall_accesses::switch_cpus1.dtb.walker
21                       # number of overall (read+write) accesses
system.l21.overall_accesses::switch_cpus1.data
1308                       # number of overall (read+write) accesses
system.l21.overall_accesses::total               1329
# number of overall (read+write) accesses

I am also adding the CacheConfig.py modified for private L2 configuration
just in case.

import m5
from m5.objects import *
from Caches import *
def config_cache(options, system):
    if options.cpu_type == "arm_detailed":
        try:
            from O3_ARM_v7a import *
        except:
            print "arm_detailed is unavailable. Did you compile the O3
model?"
            sys.exit(1)
        dcache_class, icache_class, l2_cache_class = \
            O3_ARM_v7a_DCache, O3_ARM_v7a_ICache, O3_ARM_v7aL2
    else:
        dcache_class, icache_class, l2_cache_class = \
            L1Cache, L1Cache, L2Cache
    if options.l2cache:
        # Provide a clock for the L2 and the L1-to-L2 bus here as they
        # are not connected using addTwoLevelCacheHierarchy. Use the
        # same clock as the CPUs, and set the L1-to-L2 bus width to 32
        # bytes (256 bits).
        system.l2 = [l2_cache_class(clock=options.clock,
                                   size=options.l2_size,
                                   assoc=options.l2_assoc,
                                   block_size=options.cacheline_size) for i
in xrange(options.num_cpus)]

        system.tol2bus = [CoherentBus(clock = options.clock, width = 32)
for i in xrange(options.num_cpus)]
        #system.l2.cpu_side = system.tol2bus.master
        #system.l2.mem_side = system.membus.slave
    for i in xrange(options.num_cpus):
        if options.caches:
            icache = icache_class(size=options.l1i_size,
                                  assoc=options.l1i_assoc,
                                  block_size=options.cacheline_size)
            dcache = dcache_class(size=options.l1d_size,
                                  assoc=options.l1d_assoc,
                                  block_size=options.cacheline_size)
            # When connecting the caches, the clock is also inherited
            # from the CPU in question
            if buildEnv['TARGET_ISA'] == 'x86':
                system.cpu[i].addPrivateSplitL1Caches(icache, dcache,

PageTableWalkerCache(),

PageTableWalkerCache())
            else:
                system.cpu[i].addPrivateSplitL1Caches(icache, dcache)
        system.cpu[i].createInterruptController()
        if options.l2cache:
            system.l2[i].cpu_side = system.tol2bus[i].master
            system.l2[i].mem_side = system.membus.slave
            system.cpu[i].connectAllPorts(system.tol2bus[i], system.membus)
        else:
            system.cpu[i].connectAllPorts(system.membus)
    return system


Best,
Fulya
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to