So, yeah, we're drifting pretty far off-topic now. If you have follow-up questions that you think aren't interesting for gmp-devel, just send them to me privately.
On Mon, Aug 18, 2014 at 2:35 PM, Niels Möller <ni...@lysator.liu.se> wrote: > I think the decoder could implement the prefix instruction, as I've > defined it, in that way, treating a sequence of prefix instructions + > non-prefix instruction as an indivisible longer instruction. Supervisor > mode/kernel mode code might need to know if there really is a prefix > register or not, but otherwise, it's an implementation detail not > visible to user code. I don't agree. The point of your approach is that the shift instruction is NOT a prefix, but a "proper" instruction. Thus, other instructions can come between it and the instruction which finally consumes the shift flag. So you can't treat the "shift op + consumer op" as an indivisible whole, the way you can with variable-length instructions. This is the key difference. Either you say it's a prefix op (and thus indivisible from the consuming op, essentially a variable-length op) or you let "shift-in-a-constant" be a proper instruction. You choose the later. Thus, you need to track it all the way through the pipeline so that you can abort it. > If you get a page fault from a reordered load or store, or some > other exception associated with the execution of a particular > instruction, how do you stop the instruction flow at the correct point > before the control transfer to the handler? So your outline is more-or-less right. There are different approaches to this problem, depending on your goals. My goals are: exceptions can be slow, cancelled reads are allowed to be visible outside of the CPU, cancelled writes are not. Thus, I forbid a write from even being reordered before an unconfirmed read or branch. This is not particularly difficult since I only have a single loadstore unit due to resource constraints and I have to watch for RaW conflicts through memory anyway. To cancel the other operations, I have two register renaming maps (from architectural registers to backing registers). One is at the front of the pipeline, near the decode stage. The other is at the end of the pipeline at the retire stage. When an instruction enters the OoO window, I update the front map. When an instruction is retired (leaves the OoO window), I update the back map. Instructions only leave the OoO window, in order, when they've completed. Obviously, I can retire and load multiple instructions / cycle. If an exception occurs, I just tag the instruction as killed. When it reaches the retire stage, I destroy everything in the OoO window and force the rename stage to load the retire stage's rename map. This stops processing of any remaining instructions and reverts the writes of any that were already done. I use this scheme for branch misprediction too. In all cases, I then supply the PC that was associated with the killed instruction to the fetch unit. This is where I would need to resupply the "shift register" state in your ISA as well. It sounds expensive, but after the CPU has been running a bit without exceptions, you generally have stuff waiting at the bottom of the OoO window from whatever the critical path through the code is anyway. There are other approaches that are faster. If it turns out to be too slow for misprediction, I may revise my plan. Mine is very simple, though, which makes it good for a first version on an FPGA. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel