How does switching in and out of PAL mode work? Could we take advantage
of that somehow? I don't have a really good idea of how that might help.
I'm just throwing the idea out there.

Gabe

Gabe Black wrote:
> nathan binkert wrote:
>   
>>>    I'm trying to factor out PAL mode from O3, and I found the following
>>> line
>>>
>>> http://repo.m5sim.org/m5/file/fc12f4d657f0/src/cpu/o3/fetch_impl.hh#l579
>>>
>>>    } else if (interruptPending && !(fetch_PC & 0x3)) {
>>>
>>>
>>> which is basically making fetch stop fetching once it sees that an
>>> interrupt is pending and it's no longer fetching PAL mode code. Why is
>>> that there? If interrupts are blocked while in PAL mode, why doesn't O3
>>> just rely on commit waiting until committed state is out of PAL mode and
>>> then start handling the interrupt?
>>>     
>>>       
>> Interrupts cannot be taken during palmode.  I'm not sure how this
>> would be handled from commit.  What needs to happen is that fetch
>> cannot be redirected to the interrupt handler until the CPU exits PAL
>> (i.e. the pal routine needs to finish without being interrupted at
>> all).
>>   
>>     
>
> I think what happens is that commit asks the Interrupts if there's an
> interrupt for it to service, if so it gets it and calls its invoke
> method, and that, when it sets state through the threadcontext, flushes
> out the rest of the machine and restarts execution. Fetch is just along
> for the ride, and the Interrupts object is already deciding when its
> appropriate to take an interrupt.
>
>   
>>   
>>     
>>> This behavior would be very annoying
>>> and perhaps impossible to preserve in an ISA agnostic way and I'd really
>>> like to just get rid of it, but if there's a good reason for it I'll
>>> have to be more creative.
>>>     
>>>       
>> We could have a more generic ISA specific function
>> areInterruptsEnabled() which would be inlined to return true in most
>> ISAs.
>>   
>>     
>
> Interrupts aren't going to be universally enabled in most ISAs, and the
> state they use to decide that is going to usually be different than the
> PC. They would need to read that state through the thread context which
> reflects committed state, and the Interrupts object is already doing
> that in the commit stage.
>
>   
>>   
>>     
>>> It innocuously sits there inert for SPARC and
>>> MIPS, but for ARM in thumb mode with 16 bit instructions and especially
>>> X86 where the PC can be anything with no special meaning this will cause
>>> a serious problem. This is something I'd like to have straightened out
>>> in the short to medium term.
>>>     
>>>       
>> Seems perfectly reasonable and not too difficult to me.
>>
>>   
>>     
>>> Longer term I see a problem with the way PAL mode is communicated to the
>>> ITLB as well. First, the PC is passed as part of a request as a single
>>> address. This would make sense if the PC was the actual memory address
>>> used for the fetch, but that's already in the vaddr field. Generalized
>>> PCs have more structure than that. Second, this is the mangled PC which
>>> is only there to signal PAL mode in Alpha and has no current use in any
>>> other ISA. Even in Alpha only one bit actually matters. Carting this
>>> value around for this one (admittedly important) use is a waste. This is
>>> slightly a lie because the stride prefetcher also uses the PC field, but
>>> it looks like it could (and realistically would) use the paddr or vaddr
>>> instead. Third, its use depends on the fact that the magical property
>>> communicated by the PC carries over to all the instructions being
>>> fetched as part of that access, or in other words effectively the cache
>>> line. This is something that happens to work for Alpha but won't work
>>> generally. Fourth, if the PAL mode bit is pull out of the fetch address
>>> used by the CPUs so that they truly deal with memory addresses and not
>>> architectural PC values, the architectural PC, and effectively the
>>> PAL-ness of the Alpha PC, won't be accessible to the ITLB anymore. It
>>> looks like we can either have the PC based PAL mode bit in Alpha or more
>>> complete CPU model abstraction universally but not both.
>>>     
>>>       
>> I'm not clear on why Alpha couldn't keep the PAL mode bit in the PC.
>> Seems that any ISA independent code (i.e. the CPU models) when asking
>> for the PC should get the masked value.  The ISA dependent code (i.e.
>> stuff in arch/alpha) should get the unfiltered one.  Functions like
>> areInterruptsEnabled would be machine dependent and would have access
>> to the ExecContext and the PC structure to grab the pal mode bit.
>>   
>>     
>
> The ExecContext is only for instructions when they execute (although I'm
> not 100% sure that's all it could be used for). I thought of this
> approach, but it doesn't work because the thing that puts together the
> fetch request would have to know the actual PC. If it gets the filtered
> one like you're proposing, then the PAL bit will be masked out and the
> TLB will never see it. From experience, this makes the ALPHA_FS
> regressions just sit there and miss in the ITLB over and over. If there
> was some other way to communicate PAL-ness to the TLB, then yes I think
> that scheme would likely work.
>
>   
>>   
>>     
>>> I don't have a better idea although I'm trying to think of one, but this
>>> definitely seems like something that can and should be improved on for
>>> all these reasons. One possible solution would be to store the actual
>>> PCState object in the request since that would at least be more general
>>> and allow getting the PAL bit out of the fetch address, but then we'd be
>>> carting around a handful of 64 bit values instead of just 1 when we
>>> still only care about a single bit only some of the time.
>>>     
>>>       
>> Perhaps the general PCState object should be broken up into sub
>> objects.  I.e. a single PC object for a single PC that can encapsulate
>> information about that PC.  The PCState object could be build from
>> those.
>>   
>>     
>
> I'm not sure I follow. The PCState object for, say, Alpha is two Addrs,
> the PC and the NPC. For ARM it's the PC, NPC, microPC, nextMicroPC,
> flags, and nextFlags, for SPARC its the PC, NPC, microPC, and
> nextMicroPC. I don't see how that can be broken up generically and not
> just undo the fact that there's a PCState object. I also don't see how
> that would address any of the problems above.
>
> Gabe
> _______________________________________________
> m5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/m5-dev
>   

_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to