When discussing the long time that passes between starting a Milkymist and the appearance of the first patch, patch compilation was highlighted as one source of trouble.
I had a look at the compiler and found that the scheduler, the program that places the GFPU instructions in a suitable order and performs register allocations, uses a very clear and understandable but unfortunately also rather inefficient algorithm. I've now written a drop-in replacement for gfpus.c that is faster and that also produces a more efficient schedule most of the time. My scheduler is here: http://projects.qi-hardware.com/index.php/p/wernermisc/source/tree/master/m1/perf/sched.c It has an optional optimization capability that takes a bit of CPU time and usually produces better code than the version without optimization. What it does is that it tries to prefer operations that have a long chain of further operations depend on them, so that we have a better chance of having a supply of operations to fill delay slots. I've tested that it generates code that is functionally equivalent to the original by building a minimum patch compilation environment on my x86-64 PC and comparing the content of the fixed registers at the end of the code sequence. Here's the script that reads the COMP_DEBUG output and determines what a code sequence does: http://projects.qi-hardware.com/index.php/p/wernermisc/source/tree/master/m1/perf/eval.pl Here is a performance comparison between the original, the new scheduler without optimization, and the new scheduler with optimization (on a Q6600, without -O, with -pg): http://downloads.qi-hardware.com/people/werner/m1/perf/chart-20110924.html "Time" is the run time for 10'000 compilation cycles. "Size" is the size of the instruction sequence, "Eff" the efficiency respective to the idealized FPVM code (see 6.4 section of Sebastien's thesis for details), "Regs" the number of PFPU registers allocated, "Equiv" the check for functional equivalence, and "Name" a sanitized version of the patch file's name. On the M1, I've obtained the following run times for compiling all the patches but the last one (it was also compiled, but I'm too lazy to get the time for it ;-): Original 10.5 s New scheduler 4.0 s With optimization 4.1 s The difference is less dramatical than in the performance chart. I suspect that the parsing overhead could play a bigger role on the M1 than in my test environment. The parser also offers rich potential for optimizations. With my scheduler, the total time between starting the M1 and the first patch drops from about 35 seconds to about 29 seconds. Quite a lot of time (~16 s) seems to be spent waiting for BOOTP. I'm using Flickernoise stable_1.0. Not sure if the BOOTP wait has been fixed in later versions. - Werner _______________________________________________ http://lists.milkymist.org/listinfo.cgi/devel-milkymist.org IRC: #milkymist@Freenode
