Ok... I removed the NoEffect versions from the exec context and changed the *Operand versions to just call their non-operand versions which cleans up the code some. Alpha has no problem with these changes, but they do uncover a bug with SPARC that somehow worked, but it's amazing that it did. The ASI register changes the type of instruction that is executing (e.g. one ASI is a type of prefetch vs. load vs. load with a different addressing mode etc). The types are decoded from the ASI bits in the ExtMachInst which are inserted by the pre-decoder. The trouble is that these bits are inserted in "fetch" which is before rename so the IsSerializing flags don't prevent it from happening. Thus, while the wrasi instruction can be marked as Serializing fetch already has read the old value and created the instructions based on the old ASI. When wrasi commits the old instructions flow through the rest of the pipeline and that isn't great. It's amazing that all the code sequences that Gabe used to run on the sparc/o3 model never ran into this issue. With the current code if you executed a alternating wrasi and load w/asi instructions you would get some very odd results. Anyway, the question becomes how do we fix this??? My current solution is to add an instruction flag that after commit flushes the CPU. Although, I don't know if this is enough and what other surprises I'll run into.

Thoughts?

Ali


It seems like our notion of serializing came from the definition Alpha uses, and sometimes we need a stronger one. X86 is going to have similar issues in some cases, although I couldn't necessarily list them for you off hand. The "nuke everything" flag your proposing might be the best solution because I doubt we'd ever need anything stronger than that. Maybe you could make the CPU stop fetching too, but I don't see how that would be useful and it's probably very hard to do.

This also highlights the usefulness of target tests of particular features like changing the ISA and then immediately using it as apposed to getting specific workloads to work. The compiler, code author, etc., only wants to achieve a functional result, and they'll probably use the same structure and features over and over again since those work well, are equivalently good to the other options, etc. There are swathes of x86, which granted is very large, that aren't implemented at all and Linux boots just fine, but depending on how picky you are you could say those areas are severely broken. The same thing could be happening less intentionally elsewhere.

Gabe
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to