I did a bit more cleanup and checked for low-hanging fruits for further optimization (didn't find any, see below), and I think the improved scheduler is now good for regular use:
http://projects.qi-hardware.com/index.php/p/wernermisc/source/tree/master/m1/perf/sched.c I just keep the LCPF (Longest Critical Path First) optimization enabled because it yields significantly better code at a tiny increase of total run time (about 50 ms over the whole patch set). The low-hanging fruits I've been looking for: 1) Improve the distance estimate used in LCPF. It currenly only considered write-read dependencies (i.e., if instruction X writes a register that's then read by Y, we can schedule Y only after X has completed), but not write-write dependencies (i.e., if X and Y write to the same register, X has to complete before Y completes). Adding this didn't make a difference. Perhaps that simply means such dependencies are not very common. 2) Improve the distance estimate, take two. The distance for write-read dependencies is off by one for each step. When I corrected this, the overall results were slightly worse. (Individual patches could end up being better or worse.) So I just left the old calculation. After all, it's just a heuristic :-) 3) I also tried my luck at rewriting history. In the presence of a read-write dependency, the scheduler currently only tries to issue the writing instruction after all readers of the previous register content have been issued. This is slightly sub-optimal, because we could schedule the writer already up to its latency before the last reader. I made the issue() function look for an earlier NOP slop for such cases, but it never found any. So I removed this again. I should probably document the whole scheduler in a white paper at some point in time, since many of the structures are a bit complicated and several seemingly straightforward things have exceptions that may not be obvious. I ran the whole patch compilation process (minus the last patch) and measured how much time is spent at individual stages. Of a total 3.9 seconds, about 1.0 seconds go to the scheduler (PFV and PVV). 0.3 seconds are spent reading the files, 1.1 seconds parsing them. About 1.4 seconds are burnt in finalize_pvv alone. finalize_pvv basically compiles a hard-coded script that adds some more calculations. So if we want to make patch compilation faster, the parser would be a worthwhile target, as expected. - Werner _______________________________________________ http://lists.milkymist.org/listinfo.cgi/devel-milkymist.org IRC: #milkymist@Freenode
