Trimmed, responses inline. On Tue, Jan 23, 2018 at 2:23 AM, Andreas Sandberg <[email protected]> wrote:
> On 22/01/2018 23:53, Gabe Black wrote: > > It isn't really undermining the idea of having multiple ISAs in the same > binary. The biggest "problem" is that you won't be able to use a detailed > timing model for one ISA with another ISA. That's probably fine though. I'm > pretty sure it would be hard to make a timing model that is realistic for > both Arm and x86 given that we most likely classify instructions in > slightly different ways. > It's problematic to have features which fundamentally don't work for a particular ISA, although that's not what I'm talking about. I'm talking about the #ifdef which is based on the arch since the X86 ExtMachInst can't be &-ed against and won't compile that way. The problem is that different microarchitectures would need different > classifications (i.e., different uarchs route the same instruction to > differently) and some instructions don't have fixed latencies (e.g., > division). We could grow the number of instruction classes, but that > probably won't scale since each new microarchitecture we want to model > would need new instruction classes. Similarly, each time we implement a new > divider (or some other clever gadget), we'd need to implement a custom > timing model in C++. > Yeah, I mispoke when I said latencies. What I should have said was that the instructions would be classified into groups, and then those groups would be matched against by function units in the CPU model instead of using bitfields. > > > Alternatively you could make your decoder programmable where it would take > some sort of classifier which would apply groupings in the decoder itself? > This would also make the CPU model more efficient since the instructions > aren't going to change groups, but they still get reclassified every time > the execute. > > > That's definitely a possibility, but if we make that classifier a part of > the C++ world, we effectively encode parts of our timing models in C++. > That would be highly undesirable and would make it a lot harder to make and > distribute new custom timing models. This sort of mechanism could work if > we make instruction classification programmable from Python and add the > ability to define custom instruction classes in Python (I'm not sure how > different it would be from what we do currently though). It wouldn't solve > the issue for variable latency instructions though. > What I was thinking is that instead of having an pseudo ISA independent mechanism living in the CPU which is really a second decoder, you could have the same mechanism live in the ARM decoder and just tag instructions with groups. So instead of saying this unit works with instructions where i & 0xf = 0xa, you'd say instructions where i & 0xf = 0xa go in group 2, and function units 1, 3 and 5 act on group 2. Then the CPU model is generic since it's just operating on group numbers which are totally artificial and independent of ISA, and the ISA specific part (grouping instructions) is in the decoder which is already inherently very ISA dependent. By making an instance of the decoder programmable, the decoder for cpu X can be set up to group instructions different than cpu Y, and in an equivalent way to how cpu X's functional units used to claim instructions. Gabe _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
