Ok, will do. That should be easy enough. Gabe
On 12/13/10 16:47, Ali Saidi wrote: > I've got a patch that gets closer to supporting caches between the TLB and L2 > cache, but it doesn't work. Since we don't have a way to invalidate addresses > out of the cache if you switch a memory region to uncachable, the address > remains in the cache and causes all sorts of havoc. If you want to make some > changes, please make them x86 only for now. We'll need to implement some form > of cache cleaning or invalidating before it can be supported on ARM. > > Thanks, > Ali > > On Dec 13, 2010, at 4:56 AM, Gabe Black wrote: > >> I finally got around to trying this out (patch attached) and it seemed >> to fix x86. This change seems to break ARM_FS, though. It faults when it >> tries to execute code at the fault vector because the page table entry >> supposedly is marked no execute (I think). That makes the timing CPU >> spin around and around because it keeps getting a fault, invoking it, >> and attempting to fetch again. The call stack never hits a point where >> it has to wait for an event, so it never collapses back down and >> recurses until the stack is too big and M5 segfaults. It seems to just >> get lost with the atomic CPU and I'm not entirely sure what's going on >> there, although I suspect the atomic CPU is just structured differently >> and doesn't recurse infinitely. >> >> I wanted to ask the ARM folks if they knew what was going on here. Is >> something about the page table walk supposed to be uncached but isn't? >> This seems to work without that cache added in, so I suspect the walker >> is picking up stale data or something. >> >> Gabe >> >> Gabe Black wrote: >>> Of these, I think the walker cache sounds better for two reasons. First, >>> it avoids the L1 pollution Ali was talking about, and second, a new bus >>> would add mostly inert stuff on the way to memory and which would >>> involve looking up what port to use even though it'd always be the same >>> one. I'll give that a try. >>> >>> Gabe >>> >>> Steve Reinhardt wrote: >>> >>>> I think the two easy (python-only) solutions are sharing the existing >>>> L1 via a bus and tacking on a small L1 to the walker. Which one is >>>> more realistic would depend on what you're trying to model. >>>> >>>> Steve >>>> >>>> On Tue, Nov 23, 2010 at 8:23 AM, Ali Saidi <sa...@umich.edu >>>> <mailto:sa...@umich.edu>> wrote: >>>> >>>> So what is the relatively good way to make this work in the short >>>> term? A bus? What about the slightly better version? I suppose a >>>> small cache might be ok and probably somewhat realistic. >>>> >>>> >>>> >>>> Thanks, >>>> >>>> Ali >>>> >>>> >>>> >>>> >>>> >>>> On Tue, 23 Nov 2010 08:15:01 -0800, Steve Reinhardt >>>> <ste...@gmail.com <mailto:ste...@gmail.com>> wrote: >>>> >>>> >>>>> And even though I do think it could be made to work, I'm not sure >>>>> it would be easy or a good idea. There are a lot of corner cases >>>>> to worry about, especially for writes, since you'd have to >>>>> actually buffer the write data somewhere as opposed to just >>>>> remembering that so-and-so has requested an exclusive copy. >>>>> >>>>> Actually as I think about it, that might be the case that's >>>>> breaking now... if the L1 has an exclusive copy and then it >>>>> snoops a write (and not a read-exclusive), I'm guessing it will >>>>> just invalidate its copy, losing the modifications. I wouldn't >>>>> be terribly surprised if reads are working OK (the L1 should >>>>> snoop those and respond if it's the owner), and of course it's >>>>> all OK if the L1 doesn't have a copy of the block. >>>>> >>>>> So maybe there is a relatively easy way to make this work, but >>>>> figuring out whether that's true and then testing it is still a >>>>> non-trivial amount of effort. >>>>> >>>>> Steve >>>>> >>>>> On Tue, Nov 23, 2010 at 7:57 AM, Steve Reinhardt >>>>> <ste...@gmail.com <mailto:ste...@gmail.com>> wrote: >>>>> >>>>> No, when the L2 receives a request it assumes the L1s above >>>>> it have already been snooped, which is true since the request >>>>> came in on the bus that the L1s snoop. The issue is that >>>>> caches don't necessarily behave correctly when >>>>> non-cache-block requests come in through their mem-side >>>>> (snoop) port and not through their cpu-side (request) port. >>>>> I'm guessing this could be made to work, I'd just be very >>>>> surprised if it does right now, since the caches weren't >>>>> designed to deal with this case and aren't tested this way. >>>>> >>>>> Steve >>>>> >>>>> >>>>> On Tue, Nov 23, 2010 at 7:50 AM, Ali Saidi <sa...@umich.edu >>>>> <mailto:sa...@umich.edu>> wrote: >>>>> >>>>> Does it? Shouldn't the l2 receive the request, ask for >>>>> the block and end up snooping the l1s? >>>>> >>>>> >>>>> >>>>> Ali >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Tue, 23 Nov 2010 07:30:00 -0800, Steve Reinhardt >>>>> <ste...@gmail.com <mailto:ste...@gmail.com>> wrote: >>>>> >>>>> The point is that connecting between the L1 and L2 >>>>> induces the same problems wrt the L1 that connecting >>>>> directly to memory induces wrt the whole cache >>>>> hierarchy. You're just statistically more likely to >>>>> get away with it in the former case because the L1 is >>>>> smaller. >>>>> >>>>> Steve >>>>> >>>>> On Tue, Nov 23, 2010 at 7:16 AM, Ali Saidi >>>>> <sa...@umich.edu <mailto:sa...@umich.edu>> wrote: >>>>> >>>>> >>>>> Where are you connecting the table walker? If >>>>> it's between the l1 and l2 my guess is that it >>>>> will work. if it is to the memory bus, yes, >>>>> memory is just responding without the help of a >>>>> cache and this could be the reason. >>>>> >>>>> Ali >>>>> >>>>> >>>>> >>>>> On Tue, 23 Nov 2010 06:29:20 -0500, Gabe Black >>>>> <gbl...@eecs.umich.edu >>>>> <mailto:gbl...@eecs.umich.edu>> wrote: >>>>> >>>>> I think I may have just now. I've fixed a few >>>>> issues, and am now getting >>>>> to the point where something that should be >>>>> in the pagetables is causing >>>>> a page fault. I found where the table walker >>>>> is walking the tables for >>>>> this particular access, and the last level >>>>> entry is all 0s. There could >>>>> be a number of reasons this is all 0s, but >>>>> since the main difference >>>>> other than timing between this and a working >>>>> configuration is the >>>>> presence of caches and we've identified a >>>>> potential issue there, I'm >>>>> inclined to suspect the actual page table >>>>> entry is still in the L1 and >>>>> hasn't been evicted out to memory yet. >>>>> >>>>> To fix this, is the best solution to add a >>>>> bus below the CPU for all the >>>>> connections that need to go to the L1? I'm >>>>> assuming they'd all go into >>>>> the dcache since they're more data-ey and >>>>> that keeps the icache read >>>>> only (ignoring SMC issues), and the dcache is >>>>> probably servicing lower >>>>> bandwidth normally. It also seems a little >>>>> strange that this type of >>>>> configuration is going on in the BaseCPU.py >>>>> SimObject python file and >>>>> not a configuration file, but I could be >>>>> convinced there's a reason. >>>>> Even if this isn't really a "fix" or the >>>>> "right thing" to do, I'd still >>>>> like to try it temporarily at least to see if >>>>> it corrects the problem >>>>> I'm seeing. >>>>> >>>>> Gabe >>>>> >>>>> Ali Saidi wrote: >>>>> >>>>> >>>>> I haven't seen any strange behavior yet. >>>>> That isn't to say it's not >>>>> going to cause an issue in the future, >>>>> but we've taken many a tlb miss >>>>> and it hasn't fallen over yet. >>>>> >>>>> Ali >>>>> >>>>> On Mon, 22 Nov 2010 13:08:13 -0800, Steve >>>>> Reinhardt <ste...@gmail.com >>>>> <mailto:ste...@gmail.com>> >>>>> wrote: >>>>> >>>>> Yea, I just got around to reading >>>>> this thread and that was the point >>>>> I was going to make... the L1 cache >>>>> effectively serves as a >>>>> translator between the CPU's >>>>> word-size read & write requests and the >>>>> coherent block-level requests that >>>>> get snooped. If you attach a >>>>> CPU-like device (such as the table >>>>> walker) directly to an L2, the >>>>> CPU-like accesses that go to the L2 >>>>> will get sent to the L1s but I'm >>>>> not sure they'll be handled >>>>> correctly. Not that they fundamentally >>>>> couldn't, this just isn't a >>>>> configuration we test so it's likely that >>>>> there are problems... for example, >>>>> the L1 may try to hand ownership >>>>> to the requester but the requester >>>>> won't recognize that and things >>>>> will break. >>>>> >>>>> Steve >>>>> >>>>> On Mon, Nov 22, 2010 at 12:00 PM, >>>>> Gabe Black <gbl...@eecs.umich.edu >>>>> <mailto:gbl...@eecs.umich.edu> >>>>> gbl...@eecs.umich.edu >>>>> <mailto:gbl...@eecs.umich.edu>>> wrote: >>>>> >>>>> What happens if an entry is in the >>>>> L1 but not the L2? >>>>> >>>>> Gabe >>>>> >>>>> Ali Saidi wrote: >>>>>> Between the l1 and l2 caches >>>>> seems like a good place to me. The >>>>> caches can cache page table >>>>> entries, otherwise a tlb miss would >>>>> be even more expensive then it is. >>>>> The l1 isn't normally used for >>>>> such things since it would get >>>>> polluted (look why sparc has a >>>>> load 128bits from l2, do not >>>>> allocate into l1 instruction). >>>>>> Ali >>>>>> >>>>>> On Nov 22, 2010, at 4:27 AM, >>>>> Gabe Black wrote: >>>>>> >>>>>>> For anybody waiting for an >>>>> x86 FS regression (yes, I know, >>>>> you can >>>>>>> all hardly wait, but don't let >>>>> this spoil your Thanksgiving) >>>>> I'm getting >>>>>>> closer to having it working, >>>>> but I've discovered some issues >>>>> with the >>>>>>> mechanisms behind the --caches >>>>> flag with fs.py and x86. I'm >>>>> surprised I >>>>>>> never thought to try it before. >>>>> It also brings up some >>>>> questions about >>>>>>> where the table walkers should >>>>> be hooked up in x86 and ARM. >>>>> Currently >>>>>>> it's after the L1, if any, but >>>>> before the L2, if any, which >>>>> seems wrong >>>>>>> to me. Also caches don't seem >>>>> to propagate requests upwards to >>>>> the CPUs >>>>>>> which may or may not be an >>>>> issue. I'm still looking into that. >>>>>>> Gabe >>>>>>> >>>>> >>>>> _______________________________________________ >>>>>>> m5-dev mailing list >>>>>>> m5-dev@m5sim.org >>>>> <mailto:m5-dev@m5sim.org> >>>>> m5-dev@m5sim.org >>>>> <mailto:m5-dev@m5sim.org>> >>>>> >>>>> http://m5sim.org/mailman/listinfo/m5-dev >>>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>>> m5-dev mailing list >>>>>> m5-dev@m5sim.org >>>>> <mailto:m5-dev@m5sim.org> >>>>> m5-dev@m5sim.org >>>>> <mailto:m5-dev@m5sim.org>> >>>>> >>>>> http://m5sim.org/mailman/listinfo/m5-dev >>>>> >>>>> >>>>> _______________________________________________ >>>>> m5-dev mailing list >>>>> m5-dev@m5sim.org >>>>> <mailto:m5-dev@m5sim.org> >>>>> m5-dev@m5sim.org >>>>> <mailto:m5-dev@m5sim.org>> >>>>> >>>>> >>>>> http://m5sim.org/mailman/listinfo/m5-dev >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------ >>>>> >>>>> _______________________________________________ >>>>> m5-dev mailing list >>>>> m5-dev@m5sim.org <mailto:m5-dev@m5sim.org> >>>>> http://m5sim.org/mailman/listinfo/m5-dev >>>>> >>>>> >>>>> _______________________________________________ >>>>> m5-dev mailing list >>>>> m5-dev@m5sim.org <mailto:m5-dev@m5sim.org> >>>>> http://m5sim.org/mailman/listinfo/m5-dev >>>>> >>>>> >>>>> _______________________________________________ >>>>> m5-dev mailing list >>>>> m5-dev@m5sim.org <mailto:m5-dev@m5sim.org> >>>>> http://m5sim.org/mailman/listinfo/m5-dev >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> m5-dev mailing list >>>>> m5-dev@m5sim.org <mailto:m5-dev@m5sim.org> >>>>> http://m5sim.org/mailman/listinfo/m5-dev >>>>> >>>>> >>>>> >>>> >>>> >>>> _______________________________________________ >>>> m5-dev mailing list >>>> m5-dev@m5sim.org <mailto:m5-dev@m5sim.org> >>>> http://m5sim.org/mailman/listinfo/m5-dev >>>> >>>> >>>> ------------------------------------------------------------------------ >>>> >>>> _______________________________________________ >>>> m5-dev mailing list >>>> m5-dev@m5sim.org >>>> http://m5sim.org/mailman/listinfo/m5-dev >>>> >>>> >>> _______________________________________________ >>> m5-dev mailing list >>> m5-dev@m5sim.org >>> http://m5sim.org/mailman/listinfo/m5-dev >>> >> X86, ARM: Add L1 caches for the TLB walkers. >> >> Small L1 caches are connected to the TLB walkers when caches are used. This >> allows them to participate in the coherence protocol properly. >> >> diff --git a/configs/common/CacheConfig.py b/configs/common/CacheConfig.py >> --- a/configs/common/CacheConfig.py >> +++ b/configs/common/CacheConfig.py >> @@ -43,8 +43,14 @@ >> >> for i in xrange(options.num_cpus): >> if options.caches: >> - system.cpu[i].addPrivateSplitL1Caches(L1Cache(size = '32kB'), >> - L1Cache(size = '64kB')) >> + if buildEnv['TARGET_ISA'] in ['x86', 'arm']: >> + system.cpu[i].addPrivateSplitL1Caches(L1Cache(size = >> '32kB'), >> + L1Cache(size = >> '64kB'), >> + >> PageTableWalkerCache(), >> + >> PageTableWalkerCache()) >> + else: >> + system.cpu[i].addPrivateSplitL1Caches(L1Cache(size = >> '32kB'), >> + L1Cache(size = >> '64kB')) >> if options.l2cache: >> system.cpu[i].connectMemPorts(system.tol2bus) >> else: >> diff --git a/configs/common/Caches.py b/configs/common/Caches.py >> --- a/configs/common/Caches.py >> +++ b/configs/common/Caches.py >> @@ -42,6 +42,14 @@ >> mshrs = 20 >> tgts_per_mshr = 12 >> >> +class PageTableWalkerCache(BaseCache): >> + assoc = 2 >> + block_size = 64 >> + latency = '1ns' >> + mshrs = 10 >> + size = '1kB' >> + tgts_per_mshr = 12 >> + >> class IOCache(BaseCache): >> assoc = 8 >> block_size = 64 >> diff --git a/src/cpu/BaseCPU.py b/src/cpu/BaseCPU.py >> --- a/src/cpu/BaseCPU.py >> +++ b/src/cpu/BaseCPU.py >> @@ -166,7 +166,7 @@ >> if p != 'physmem_port': >> exec('self.%s = bus.port' % p) >> >> - def addPrivateSplitL1Caches(self, ic, dc): >> + def addPrivateSplitL1Caches(self, ic, dc, iwc = None, dwc = None): >> assert(len(self._mem_ports) < 8) >> self.icache = ic >> self.dcache = dc >> @@ -175,12 +175,17 @@ >> self._mem_ports = ['icache.mem_side', 'dcache.mem_side'] >> if buildEnv['FULL_SYSTEM']: >> if buildEnv['TARGET_ISA'] in ['x86', 'arm']: >> - self._mem_ports += ["itb.walker.port", "dtb.walker.port"] >> + self.itb_walker_cache = iwc >> + self.dtb_walker_cache = dwc >> + self.itb.walker.port = iwc.cpu_side >> + self.dtb.walker.port = dwc.cpu_side >> + self._mem_ports += ["itb_walker_cache.mem_side", \ >> + "dtb_walker_cache.mem_side"] >> if buildEnv['TARGET_ISA'] == 'x86': >> self._mem_ports += ["interrupts.pio", "interrupts.int_port"] >> >> - def addTwoLevelCacheHierarchy(self, ic, dc, l2c): >> - self.addPrivateSplitL1Caches(ic, dc) >> + def addTwoLevelCacheHierarchy(self, ic, dc, l2c, iwc = None, dwc = >> None): >> + self.addPrivateSplitL1Caches(ic, dc, iwc, dwc) >> self.toL2Bus = Bus() >> self.connectMemPorts(self.toL2Bus) >> self.l2cache = l2c >> diff --git a/src/cpu/o3/O3CPU.py b/src/cpu/o3/O3CPU.py >> --- a/src/cpu/o3/O3CPU.py >> +++ b/src/cpu/o3/O3CPU.py >> @@ -141,7 +141,7 @@ >> smtROBThreshold = Param.Int(100, "SMT ROB Threshold Sharing Parameter") >> smtCommitPolicy = Param.String('RoundRobin', "SMT Commit Policy") >> >> - def addPrivateSplitL1Caches(self, ic, dc): >> - BaseCPU.addPrivateSplitL1Caches(self, ic, dc) >> + def addPrivateSplitL1Caches(self, ic, dc, iwc = None, dwc = None): >> + BaseCPU.addPrivateSplitL1Caches(self, ic, dc, iwc, dwc) >> self.icache.tgts_per_mshr = 20 >> self.dcache.tgts_per_mshr = 20 >> diff --git a/tests/configs/realview-simple-atomic.py >> b/tests/configs/realview-simple-atomic.py >> --- a/tests/configs/realview-simple-atomic.py >> +++ b/tests/configs/realview-simple-atomic.py >> @@ -53,6 +53,17 @@ >> write_buffers = 8 >> >> # --------------------- >> +# Page table walker cache >> +# --------------------- >> +class PageTableWalkerCache(BaseCache): >> + assoc = 2 >> + block_size = 64 >> + latency = '1ns' >> + mshrs = 10 >> + size = '1kB' >> + tgts_per_mshr = 12 >> + >> +# --------------------- >> # I/O Cache >> # --------------------- >> class IOCache(BaseCache): >> @@ -86,7 +97,9 @@ >> >> #connect up the cpu and l1s >> cpu.addPrivateSplitL1Caches(L1(size = '32kB', assoc = 1), >> - L1(size = '32kB', assoc = 4)) >> + L1(size = '32kB', assoc = 4), >> + PageTableWalkerCache(), >> + PageTableWalkerCache()) >> # connect cpu level-1 caches to shared level-2 cache >> cpu.connectMemPorts(system.toL2Bus) >> cpu.clock = '2GHz' >> diff --git a/tests/configs/realview-simple-timing.py >> b/tests/configs/realview-simple-timing.py >> --- a/tests/configs/realview-simple-timing.py >> +++ b/tests/configs/realview-simple-timing.py >> @@ -54,6 +54,17 @@ >> write_buffers = 8 >> >> # --------------------- >> +# Page table walker cache >> +# --------------------- >> +class PageTableWalkerCache(BaseCache): >> + assoc = 2 >> + block_size = 64 >> + latency = '1ns' >> + mshrs = 10 >> + size = '1kB' >> + tgts_per_mshr = 12 >> + >> +# --------------------- >> # I/O Cache >> # --------------------- >> class IOCache(BaseCache): >> @@ -88,7 +99,9 @@ >> >> #connect up the cpu and l1s >> cpu.addPrivateSplitL1Caches(L1(size = '32kB', assoc = 1), >> - L1(size = '32kB', assoc = 4)) >> + L1(size = '32kB', assoc = 4), >> + PageTableWalkerCache(), >> + PageTableWalkerCache()) >> # connect cpu level-1 caches to shared level-2 cache >> cpu.connectMemPorts(system.toL2Bus) >> cpu.clock = '2GHz' >> _______________________________________________ >> m5-dev mailing list >> m5-dev@m5sim.org >> http://m5sim.org/mailman/listinfo/m5-dev > _______________________________________________ > m5-dev mailing list > m5-dev@m5sim.org > http://m5sim.org/mailman/listinfo/m5-dev _______________________________________________ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev