I did a bit more cleanup and checked for low-hanging fruits for
further optimization (didn't find any, see below), and I think
the improved scheduler is now good for regular use:

http://projects.qi-hardware.com/index.php/p/wernermisc/source/tree/master/m1/perf/sched.c

I just keep the LCPF (Longest Critical Path First) optimization
enabled because it yields significantly better code at a tiny
increase of total run time (about 50 ms over the whole patch
set).

The low-hanging fruits I've been looking for:

1) Improve the distance estimate used in LCPF. It currenly only
   considered write-read dependencies (i.e., if instruction X
   writes a register that's then read by Y, we can schedule Y
   only after X has completed), but not write-write dependencies
   (i.e., if X and Y write to the same register, X has to
   complete before Y completes).

   Adding this didn't make a difference. Perhaps that simply
   means such dependencies are not very common.

2) Improve the distance estimate, take two. The distance for
   write-read dependencies is off by one for each step. When I
   corrected this, the overall results were slightly worse.
   (Individual patches could end up being better or worse.)

   So I just left the old calculation. After all, it's just a
   heuristic :-)

3) I also tried my luck at rewriting history. In the presence
   of a read-write dependency, the scheduler currently only
   tries to issue the writing instruction after all readers of
   the previous register content have been issued. 

   This is slightly sub-optimal, because we could schedule the
   writer already up to its latency before the last reader. I
   made the issue() function look for an earlier NOP slop for
   such cases, but it never found any. So I removed this again.

I should probably document the whole scheduler in a white paper
at some point in time, since many of the structures are a bit
complicated and several seemingly straightforward things have
exceptions that may not be obvious.

I ran the whole patch compilation process (minus the last patch)
and measured how much time is spent at individual stages. Of a
total 3.9 seconds, about 1.0 seconds go to the scheduler (PFV
and PVV). 0.3 seconds are spent reading the files, 1.1 seconds
parsing them. About 1.4 seconds are burnt in finalize_pvv alone.
finalize_pvv basically compiles a hard-coded script that adds
some more calculations.

So if we want to make patch compilation faster, the parser
would be a worthwhile target, as expected.

- Werner
_______________________________________________
http://lists.milkymist.org/listinfo.cgi/devel-milkymist.org
IRC: #milkymist@Freenode

Reply via email to