Don wrote:
Walter Bright wrote:
Don wrote:
In hand-coded asm, instruction scheduling still gives more than half of the same benefit that it used to do. But, it's become ten times more difficult. You have to use Agner Fog's manuals, not Intel/AMD.

For example:
(1) a common bottleneck on all Intel processors, is that you can only read from three registers per cycle, but you can also read from any register which has been modified in the last three cycles.
(2) it's important to break dependency chains.

On the BigInt code, instruction scheduling gave a speedup of ~40%.

Wow. I didn't know that. Do any compilers currently schedule this stuff?

Intel probably does. I don't think any others do a very good job. Agner told me that he had had no success in getting compiler vendors to be interested in his work.

Well, this one is. In fact, could we get Agner to actively help us out with 
this?


Any chance you want to take a look at cgsched.c? I had great success using the same algorithm for the quite different Pentium and P6 scheduling minutia.

That would really be fun.
BTW, the current Intel processors are basically the same as Pentium Pro, with a few improvements. The strange thing is, because of all of the reordering that happens, swapping the order of two (non-dependent) instructions makes no difference at all. So you always need to look at every instruction in the a loop, before you can do any scheduling.

I was looking at Agner's document, and it looks like ordering the instructions in the 4-1-1 or 4-1-1-1 for optimal decoding could work. This would fit right in with the way the scheduler works.

I had thought that with the CPU automatically reordering instructions, that scheduling them was obsolete.

Reply via email to