Don wrote:
Walter Bright wrote:
Don wrote:
In hand-coded asm, instruction scheduling still gives more than half
of the same benefit that it used to do. But, it's become ten times
more difficult. You have to use Agner Fog's manuals, not Intel/AMD.
For example:
(1) a common bottleneck on all Intel processors, is that you can only
read from three registers per cycle, but you can also read from any
register which has been modified in the last three cycles.
(2) it's important to break dependency chains.
On the BigInt code, instruction scheduling gave a speedup of ~40%.
Wow. I didn't know that. Do any compilers currently schedule this stuff?
Intel probably does. I don't think any others do a very good job. Agner
told me that he had had no success in getting compiler vendors to be
interested in his work.
Well, this one is. In fact, could we get Agner to actively help us out with
this?
Any chance you want to take a look at cgsched.c? I had great success
using the same algorithm for the quite different Pentium and P6
scheduling minutia.
That would really be fun.
BTW, the current Intel processors are basically the same as Pentium Pro,
with a few improvements. The strange thing is, because of all of the
reordering that happens, swapping the order of two (non-dependent)
instructions makes no difference at all. So you always need to look at
every instruction in the a loop, before you can do any scheduling.
I was looking at Agner's document, and it looks like ordering the instructions
in the 4-1-1 or 4-1-1-1 for optimal decoding could work. This would fit right in
with the way the scheduler works.
I had thought that with the CPU automatically reordering instructions, that
scheduling them was obsolete.