Simon Burge writes:
>I'm curious - there seems to be definite patterns in the sections of
>code (especially when comparing the different length radix subroutines).
>Is any of this code machine generated, or is it all hand generated?
All hunt-and-peck hand-generated. Of course many sections of code are
very similar, so much was created via cut-paste-modify. That's what I
meant about the out-of-place transform strategy being nice from a coding
and debugging (and now, speed) perspective.
The best place to read about the speedups and view the evolution of the
code is the program header.
>In the past I attempted to get a C version of your code working but didn't
>quite get there, and then Gord Palameta did a version last year. I was
>thinking of taking a stab at trying 2.6b again...
I agree, it would be nice to have a C version, but as long as there are
executables for the major platforms, it's not crucial. I'll have a Linux
binary hopefully by middle of next week. Since we have Alpha Unix and SGI
Irix binaries, the major voids are SPARC and the PowerPC (and its successor,
the AltiVec). Hopefully when Alex Kruppa returns
to a quasi-normal routines (and passes his TU Mu"nchen exams), one of the
three f90 compilers he said he'd try on his Ultra will prove to be a decent
one.
Of course, PPC and SPARCers can (and should) use MacLucasUnix in the meantime.
There's also Jason Papadopoulos' potential LL code, but even with lots of
encouragement it'll likely take him months to modify his Fermat number code
appropriately, and even longer to add non-power-of-2 runlengths. As anyone
who has coded such algorithms knows, it's highly nontrivial, especially
when you need the code to not just run, but to run fast. (I've been working
on Mlucas for nearly three years - of course much of year 1 was spent just
learning the basics of FFTs, the LL test, and the DWT.)
>Here's some more times with a few explainitory notes below:
Thanks! Those look very good, no worse than a few percent worse than
MacLucasUnix at any power-of-2 runlength, and quite a bit faster at most
of the intermediate lengths and at 1024K.
>[4] - The speed increase in secs/iter between my binary and Ernst's
> binary. Ernst - what compiler options did you use?
I didn't use -arch ev6, used -O4 rather than -O5 (-O5 runs slower on my
21064, where I compile) and didn't use -fast (also a tad slower on my ev4).
I'll compile using your options (at least for the ev6 executable) sometime
in the next few days. The difference is slight, but I know every bit counts-
e.g. for my Fermat-DWT code, every 1% speedup saved nearly 2 days of runtime
on F24 (on a 250 MHz MIPS R10000).
Cheers and happy hunting,
Ernst
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers