How does switching in and out of PAL mode work? Could we take advantage of that somehow? I don't have a really good idea of how that might help. I'm just throwing the idea out there.
Gabe Gabe Black wrote: > nathan binkert wrote: > >>> I'm trying to factor out PAL mode from O3, and I found the following >>> line >>> >>> http://repo.m5sim.org/m5/file/fc12f4d657f0/src/cpu/o3/fetch_impl.hh#l579 >>> >>> } else if (interruptPending && !(fetch_PC & 0x3)) { >>> >>> >>> which is basically making fetch stop fetching once it sees that an >>> interrupt is pending and it's no longer fetching PAL mode code. Why is >>> that there? If interrupts are blocked while in PAL mode, why doesn't O3 >>> just rely on commit waiting until committed state is out of PAL mode and >>> then start handling the interrupt? >>> >>> >> Interrupts cannot be taken during palmode. I'm not sure how this >> would be handled from commit. What needs to happen is that fetch >> cannot be redirected to the interrupt handler until the CPU exits PAL >> (i.e. the pal routine needs to finish without being interrupted at >> all). >> >> > > I think what happens is that commit asks the Interrupts if there's an > interrupt for it to service, if so it gets it and calls its invoke > method, and that, when it sets state through the threadcontext, flushes > out the rest of the machine and restarts execution. Fetch is just along > for the ride, and the Interrupts object is already deciding when its > appropriate to take an interrupt. > > >> >> >>> This behavior would be very annoying >>> and perhaps impossible to preserve in an ISA agnostic way and I'd really >>> like to just get rid of it, but if there's a good reason for it I'll >>> have to be more creative. >>> >>> >> We could have a more generic ISA specific function >> areInterruptsEnabled() which would be inlined to return true in most >> ISAs. >> >> > > Interrupts aren't going to be universally enabled in most ISAs, and the > state they use to decide that is going to usually be different than the > PC. They would need to read that state through the thread context which > reflects committed state, and the Interrupts object is already doing > that in the commit stage. > > >> >> >>> It innocuously sits there inert for SPARC and >>> MIPS, but for ARM in thumb mode with 16 bit instructions and especially >>> X86 where the PC can be anything with no special meaning this will cause >>> a serious problem. This is something I'd like to have straightened out >>> in the short to medium term. >>> >>> >> Seems perfectly reasonable and not too difficult to me. >> >> >> >>> Longer term I see a problem with the way PAL mode is communicated to the >>> ITLB as well. First, the PC is passed as part of a request as a single >>> address. This would make sense if the PC was the actual memory address >>> used for the fetch, but that's already in the vaddr field. Generalized >>> PCs have more structure than that. Second, this is the mangled PC which >>> is only there to signal PAL mode in Alpha and has no current use in any >>> other ISA. Even in Alpha only one bit actually matters. Carting this >>> value around for this one (admittedly important) use is a waste. This is >>> slightly a lie because the stride prefetcher also uses the PC field, but >>> it looks like it could (and realistically would) use the paddr or vaddr >>> instead. Third, its use depends on the fact that the magical property >>> communicated by the PC carries over to all the instructions being >>> fetched as part of that access, or in other words effectively the cache >>> line. This is something that happens to work for Alpha but won't work >>> generally. Fourth, if the PAL mode bit is pull out of the fetch address >>> used by the CPUs so that they truly deal with memory addresses and not >>> architectural PC values, the architectural PC, and effectively the >>> PAL-ness of the Alpha PC, won't be accessible to the ITLB anymore. It >>> looks like we can either have the PC based PAL mode bit in Alpha or more >>> complete CPU model abstraction universally but not both. >>> >>> >> I'm not clear on why Alpha couldn't keep the PAL mode bit in the PC. >> Seems that any ISA independent code (i.e. the CPU models) when asking >> for the PC should get the masked value. The ISA dependent code (i.e. >> stuff in arch/alpha) should get the unfiltered one. Functions like >> areInterruptsEnabled would be machine dependent and would have access >> to the ExecContext and the PC structure to grab the pal mode bit. >> >> > > The ExecContext is only for instructions when they execute (although I'm > not 100% sure that's all it could be used for). I thought of this > approach, but it doesn't work because the thing that puts together the > fetch request would have to know the actual PC. If it gets the filtered > one like you're proposing, then the PAL bit will be masked out and the > TLB will never see it. From experience, this makes the ALPHA_FS > regressions just sit there and miss in the ITLB over and over. If there > was some other way to communicate PAL-ness to the TLB, then yes I think > that scheme would likely work. > > >> >> >>> I don't have a better idea although I'm trying to think of one, but this >>> definitely seems like something that can and should be improved on for >>> all these reasons. One possible solution would be to store the actual >>> PCState object in the request since that would at least be more general >>> and allow getting the PAL bit out of the fetch address, but then we'd be >>> carting around a handful of 64 bit values instead of just 1 when we >>> still only care about a single bit only some of the time. >>> >>> >> Perhaps the general PCState object should be broken up into sub >> objects. I.e. a single PC object for a single PC that can encapsulate >> information about that PC. The PCState object could be build from >> those. >> >> > > I'm not sure I follow. The PCState object for, say, Alpha is two Addrs, > the PC and the NPC. For ARM it's the PC, NPC, microPC, nextMicroPC, > flags, and nextFlags, for SPARC its the PC, NPC, microPC, and > nextMicroPC. I don't see how that can be broken up generically and not > just undo the fact that there's a PCState object. I also don't see how > that would address any of the problems above. > > Gabe > _______________________________________________ > m5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/m5-dev > _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
